AD-A212  365 


9  « 


Research  Product  89-20 


Questionnaire  Construction  Manual 


DTIC 

ELECTE 
SEP  1  3 1989 

D 


June  1989 


Fort  Hood  Field  Unit 
Systems  Research  Laboratory 


U.S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences 

Approved  for  public  release;  distribution  is  unliinited. 


89 


9  12 


091 


U.S.  ARMY  RESEARCH  INSTITUTE 

FOR  THE  BEHAVIORAL  AND  SOCIAL  SCIENCES 


A  Field  Operating  Agency  Under  the  Jurisdiction 
of  the  Deputy  Chief  of  Staff  for  Personnel 


EDGAR  M.  JOHNSON 
Technical  Director 


JON  W.  BLADES 
COL,  IN 
Commanding 


Research  accomplished  under  contract 
for  the  Department  of  the  Army 

Essex  Corporation 


Technical  review  by 
David  Meister 

W.F.  Moroney,  MSC,  U.S.  Navy 


NOTICES 

FINAL  DISPOSITION;  This  Research  Product  may  be  destroyed  wliea  it  is  no  longer  needed. 
Please  do  not  return  it  to  the  U.S.  Army  Research  Institute  for  the  Beiiavknal  and  Social  Sciences. 


NOTE;  This  Research  Product  is  not  to  be  construed  as  an  official  Deftanmcnt  of  the  Army 
document,  unless  so  designated  by  other  authorized  documents. 


REPRODUCTION  QUALITY  NOTICE 


This  document  is  the  best  quality  available.  The  copy  furnished 
to  OTIC  contained  pages  that  may  have  the  following  quality 
problems: 

•  Pages  smaller  or  larger  than  normal. 

•  Pages  with  background  color  or  light  colored  printing. 

•  Pages  with  small  type  or  poor  printing;  and  or 

•  Pages  with  continuous  tone  material  or  color 
photographs. 

Due  to  various  output  media  available  these  conditions  may  or 
may  not  cause  poor  legibility  in  the  microfiche  or  hardcopy  output 
you  receive. 

I  I  If  this  block  is  checked,  the  copy  furnished  to  OTIC 
contained  pages  with  color  printing,  that  when  reproduced  in 
Black  and  White,  may  change  detail  of  the  original  copy. 


UNCLASSIFIED 

RITY  CLASSIFICATION 


1*.  REPORT  SECURITY  CLASSIFICATION 

Unclassified 

2a.  SECURITY  CLASSIFICATION  AUTHORITY 

2b.  DECLASSIFICATION /DOWNGRADING  SCHEDULE 

A  PERFORMING  ORGANIZATION  REPORT  NUM8ER(S) 


REPORT  DOCUMENTATION  PAGE 

ION  I  tb.  RESTRICTIVE  MARKINGS 


Form  Approvtd 
0MB  No.  0704-01 88 


6a.  NAME  OF  PERFORMING  ORGANIZATION 
Essex  Corporation 
Human  Factors  &  Training 
r.rnim 

ecAOORESMG^rstawrandzJPCWeJ 
741  Lakefleld  Road,  Suite  B 
Westlake  Village,  CA  91361 

8a.  NAME  OF  FUNDING /SPONSORING 
,  ORGANIZATION  U.S.  Army  Research 
Institute  for  the  Behavioral 
and  Social  Sciences 
ac  ADDRESS  rcty,  Sfara.  ind  ZIP  Code) 

5001  Elsenhower  .4venue 
Alexandria,  VA  22333-5600 

Tnii  (Indudo  Sooutity  Oissifitation) 


6b.  OFFICE  SYMBOL 
(If  applicablo) 


8b.  OFFICE  SYMBOL 
(If  aptlieablo) 

_ PERI-ZA 


3.  DISTRIBUTION /AVAILABILITY  OF  REPORT 

i 

Approved  for  public  release; 
distribution  Is  unlimited. 

S.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 

ARI  Research  Product  89-20 

7a.  NAME  OF  MONITORING  ORGANIZATION 
U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences 
7b.  ADDRESS  (C/ty.  State,  and  ZIP  Code) 

ARI  Field  Unit  at  Fort  Hood 
HQ  TCATA  (PERI-SH) 

Fort  Hood,  TX  76544 

9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

MDA903-83-C-0033 _ 

10.  SOURCE  OF  FUNDING  NUMBERS _ 

[program  I  PROJECT  (TASK  IWORI 

I  ELEMENT  NO.  NO.  NO.  ACCE! 

I  63739A  793  321 


PROJECT 

TASK 

WORK  UNIT 

NO. 

NO. 

ACCESSION  NC 

793 

321 

0 

Questionnaire  Construction  Manual 
12.  PERSONAL  AUTHOR(S) 

Babbitt,  Bettina  A.  (Essex  Corporation),  and  Nystrom,  Charles  0.  (ARI) 

13a.  TYPE  OF  REPORT  jl3b.  TIME  COVERED  Jl4.  DATE  OF  REPORT  {Year, Month, Day)  j15.  PAGE  COUNT 

Final  FROM  84/12  to  85/03  1989.  June  777 

16.  SUPPLEMENTARY  NOTATION  (Continuec 

This  is  a  revised  version  of  the  July  1976  Questionnaire  Construction  Manual,  P-77-1, 
originally  authored  by  R.  F.  Dyer.  J.  J.  Mathews.  C.  E.  Wrleht.  and  K.  T..  Ytidowlfch. 

17.  CCSATI  CODES _  18.  SUBJECT  TERMS  (Continue  on  revertr  if  necejtaiy  and  identify  by  block  number) 

FIELD  )  GROUP  SUB-GROUP  ^^Questionnaire  construction  Scaling  techniques  f 

I  /  Questionnaire  administration.  Response  anchoring,  _ n, 

t  ‘1  Attitude  scales  _ (Continued) 

19.  ABSTRACT  (Continue  on  reverse  if  necessary  and  identify  by  block  number) 

This  Questionnaire  Construction  Manual  is  a  revised  version  of  a  1976  manual.  The 
latest  research  methods  for  developing  questionnaires  are  presented.  The  manual  was  de¬ 
signed  to  guide  individuals  who  develop  and/.or  administer  questionnaires  as  part  of  Army 
field  tests  and  evaluations.  The  content  is  applicable  to  many  nonmilitary  applications. 


20.  DiSTRIBU  nON/AVAIlABIUTY  OF  ABSTRACT 
□  UNCLASSIFIEDAJNLIMITED  g]  SAME  AS  RPT. 
22a.  NAME  OF  RESPONSIBLE  INDIVIDUAL 
Charles  0.  Nystrom _ _ 

DO  Form  1473,  JUN  86 


□  OTIC  USERS 


21.  ABSTRACT  SECURITY  CLASSIFICATION 

Unclassified _ _ _ _ 

22b.  TELEPHONE  (/nc/ud*  Area  Code)  22c.  OFFICE  SYMBOL 
(817)  288-9118  PERI-SH 


Previous  editions  are  obsolete. 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 
UNCLASSIFIED 


teCUWITY  CLASSiriCATIOM  OF  This  ^AOgflWl»»  Data  mi>l0n4) 

ARI  Research  Product  89-20 

16.  SUPPLEMENTARY  NOTATION  (Continued) 

Operations  Research  Associates,  Palo  Alto,  CA.  Charles  0.  Nystrom  Is 
the  Contracting  Officer's  Representative. 

18.  SUBJECT  TERMS  (Continued) 

^  Response  alternatives 
Pretesting  questionnaires 
survey  interviews  , 

.  .  .  '  .  »  -  r 

t  •  .  ■ 

•  •  k  V  y  ■  ■  ■  '  '  ^  J  ^  ♦  '  •  •  ^  ' 


_ UNCLASSIFIED _ 

SKCuMiTr  cc ASSiFiCATioit  OF  THIS  FAcermiM  om*  eni»n^) 


11 


Research  Product  89-20 


Questionnaire  Construction  Manuai 

Bettina  A.  Babbitt 

Essex  Corporation 

and 

Charles  O.  Nystrom 

U.S.  Army  Research  Institute 


Field  Unit  at  Fort  Hood,  Texas 
George  M.  Gividen,  Chief 

Systems  Research  Laboratory 
Robin  L.  Keesee,  Director 

U.S.  Army  Research  institute  for  the  Behavioral  and  Social  Sciences 
5001  Eisenhower  Avenue.  Alexandria,  Virginia  22333-5600 

Office,  Deputy  Chief  of  Staff  for  Personnel 
Department  of  the  Army 

June  1989 


Army  Project  Number  Human  Factors  Evaluation 

2Q263739A793 


Approved  for  public  releasa;  distribution  is  unlimited. 


F0T>?=:W0P1 


The  U.S.  Army  Rese&rc.  institute  for  the  Behavioral  and  Social  Sciences 
(ARI),  Field  Unit  at  Fort  Hood,  Texas,  actively  guided  this  revision  of  their 
10-year-old  Questionnaire  Construction  Manual  (P-77-1).  The  questionnaire 
construction  manual  was  designed  to  guide  Individuals  who  develop  and/or  ad¬ 
minister  questionnaires  as  part  of  Army  operational  tests.  It  is,  however, 
suitable  for  a  variety  of  disciplines  and  occupations.  Guidance  is  provided 
in  the  development  of  questionnaire  items,  administration  procedures,  types  of 
questionnaire  items,  attitude  scales  and  scaling  techniques,  response  anchor¬ 
ing  and  response  alternatives,  format  considerations,  pretests,  interviews, 
demographic  characteristics,  and  evaluation  of  results. 

This  product  was  completed  under  Program  Task  1.5.1,  "Soldier/System  Con¬ 
siderations  in  Force  Development  User  Testing  (Advanced  Development)."  ARI 
and  the  Sponsor  for  the  product  work  under  a  "Memorandum  of  Agreement  between 
ARI  and  Training  and  Doctrine  Command  (TRADOC)  Combined  Arms  Test  Activity 
(TCATA)"  that  was  signed  in  May  1981.  The  Chief  of  TCATA's  Methodology  and 
Analysis  Section  has  been  briefed  on  the  product  content.  TCATA  has  been 
using  the  predecessor  Questionnaire  Construction  Manual  to  test  officers  for 
over  10  years  and  would  like  to  use  the  updated  product. 


EDGAR  M.  JOHNSOF 
Technical  Director 


V 


Several  people  helped  to  prepare  this  manual.  A  special  acknowledgment 
goes  to  Dr.  Frederick  A.  Muckier,  Essex  Corporation,  for  his  guidance  and  con¬ 
tinuous  support  during  all  aspects  of  the  preparation  of  this  report.  The 
contribution  of  Mr.  George  M.  Gividen,  U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences,  is  most  gratefully  acknowledged.  Mr.  Clarence 
A.  Semple,  Essex  Corporation,  contributed  generously  in  editing.  Mrs.  Joan  M. 
Funk,  Essex  Corporation,  merits  special  recognition  for  her  technical  assis¬ 
tance  in  preparing  and  editing  the  manuscript. 


QUESTIONNAIRE  CONSTRUCTION  HANUAL 


!XECUTIVE  SUMMARY 


This  manual  updates  the  10>year-old  ’’Questionnaire  Construction  Manual." 
The  revision  vas  prepared  primarily  by  the  Essex  Corporation  under  contract  to 
the  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences  (ARI).  It 
has  the  same  purpose  as  the  earlier  version** to  provide  guidance  for  those  vho 
construct  and/or  administer  questionnaires  as  part  of  Army  operational  tests 
and  evaluations  such  as  those  conducted  by  the  TRADOC  Combined  arms  Test  Ac¬ 
tivity  and  the  Operational  Test  and  Evaluation  Agency.  Much  of  the  content  is 
applicable  to  more  than  operational  test  situations;  the  manual  should  prove 
useful  to  all  persons  Involved  in  the  construction  and  administration  of  sur¬ 
veys,  interviews,  or  questionnaires.  K -  ■  •f  "'  -  ^ 

In  1975,  Operations  Research  Associates  reviewed  the  research  literature 
on  the  construction  and  administration  of  questionnaires  and  interviews.  They 
produced  two  products.  One  was  the  forerunner  of  this  manual.  It  was  titled 
"Questionnaire  Construction  Manual"  and  was  published  by  ARI  in  1976.  A  revi¬ 
sion  was  done  in  1976  and  issued  in  quantity  in  1977  as  ARI  Special  Publica¬ 
tion  P-77-1.  The  other  product  was  a  report  of  the  literature  survey  and  a 
bibliography  of  the  articles  examined.  It  was  issued  in  1977  as  P-77-2,  with 
the  title  "Questionnaire  Construction  Manual  Annex:  Literature  Survey  and 
Bibliography." 

In  1983,  the  literature  vas  again  reviewed,  but  only  from  the  point  where 
ORA’S  review  had  ended  in  1975.  Analysis  of  the  more  recent  literature  pro¬ 
vided  the  basis  for  the  revision  to  the  manual.  A  report  of  the  literature 
survey  has  been  published  under  the  title,  "Questionnaires:  Literature  Survey 
and  Bibliography." 


vii/i/iii 


QUESTIONNAIRE  CONSTRUCTION  MANUAL 


gQFTEWrS _ 

Page 

I.  INTRODUCTION . ' .  1 

A.  Purpose  and  Organization  of  This  Manual . .  .  .  .  1 

B.  Definition  of  Questionnaire  .  2 

C.  Conventions  Used  in  This  Manual .  3 

D.  Keeping  This  Manual  Up  to  Date . .  4 

E.  Reporting  Problems  and  Suggestions  for  Improvement  .  5 

II.  MAJOR  QUESTIONNAIRE  TYPES  AND  ADMINISTRATION  PROCEDURES  .  7 

A.  Overview .  7 

B.  Types  of  Questionnaires  Discussed  in  This  Manual  .  8 

C.  Ways  That  Questionnaires  Can  Be  Administered .  9 

D.  Structured  Interviews  Versus  Other  Types  of  Questionnaires  .  .  11 

III.  CONTENT  OF  QUESTIONNAIRE  ITEMS  .  13 

.  Overview .  13 

B,  Determining  Questionnaire  Content  Preliminary  Research  ....  14 

C.  Other  Considerations  Related  to  Questionnaire  Content  . .  20 

IV.  TYPES  OF  QUESTIONNAIRE  ITEMS . .  23 

A.  Overview .  23 

B.  Open-Ended  Items  .  24 

C.  Multiple  Choice  Items  .  28 

D.  Rating  Scale  Items . .  32 

E.  Behavioral  Scale  Items  .  37 

F.  Ranking  Items .  44 

G.  Forced  Choice  Items  .  47 

H.  Card  Sorting  Items/Tasks  . .  50 

I.  Semantic  Differential  Items  ...  .  52 

J.  Other  Types  of  Items . .  .  55 

V.  ATTITUDE  SCALES  AND  SCALING  TECHNIQUES .  59 

A.  Overview . 59 

B.  Thurstone  Scales . 61 

C.  Likert  Scales .  64 

D.  Guttman  Scales .  68 

E.  Other  Scaling  Techniques . . .  71 


ix 


CONTENTS  fContinuej:) _ 

Page 

VI.  PREPARATION  OP  QUESTIONNAIRE  ITEMS  .  73 

A.  Overview . . .  73 

B.  Node  of  Items  . .  74 

C.  Wording  of  Items .  75 

D.  Difficulty  of  Items .  91 

B.  Length  of  Question/Stem  . .  94 

F.  Order  of  Quest  ion/ Stems .  95 

G.  Number  of  Response  Alternatives . 98 

H.  Order  of  Response  Alternatives  .  .  .  102 

VII.  RESPONSE  ANCHORING  .  105 

A.  Overview . 105 

B.  Types  of  Response  Anchors  .  105 

C.  Anchored  Versus  Unanchored  Scales  .  109 

D.  Amount  of  Verbal  Anchoring . 110 

E.  Procedures  for  the  Selection  of  Verbal  Scale  Anchors . Ill 

F.  Scale  Balance,  Midpoints,  and  Polarity  .  112 

VIII.  EMPIRICAL  BASES  FOR  SELECTING  MODIFIERS  FOR  RESPONSE 

ALTERNATIVES . . . 115 

A.  Overview . '.15 

B.  General  Considerations  in  the  Selection  of  Response 

Alternatives . 118 

C.  Selection  of  Response  Alternatives  Denoting  Degrees 

of  Frequency . 132 

D.  Selection  of  Response  Alternatives  Using  Order  of  Merit 

Lists  of  Descriptor  Terms . 133 

E.  Selection  of  Response  Alternatives  Using  Scales  Values 

and  Standard  Deviations  .  135 

F.  Sample  Sets  of  Response  Alternatives  .  156 

IX.  PHYSICAL  CHARACTERISTICS  OF  QUESTIONNAIRES  .  163 

A.  Overview . 163 

B.  Location  of  Response  Alternatives  Relative  to  the  Stem  ....  164 

C.  Questionnaire  Length  . . 166 

D.  Questionnaire  Format  Considerations  .  168 

E.  Use  of  Answer  Sheets . 172 

F.  Use  of  Branching . 173 

X.  CONSIDERATIONS  RELATED  TO  QUESTIONNAIRE  ADMINISTRATION . 175 

A.  Overview  .  .....  175 

B.  Instructions . . 

C.  Anonymity  for  Respondents  .  178 


X 


:S  (Continued) 

Page 

D.  Motivational  Factors . . . . 183 

E.  Administration  Time  .  . . 186 

F.  Characteristics  of  Administrators  .  . . 187 

G.  Administration  Conditions  . .  188 

H.  Training  of  Field  Test  Evaluators . 189 

I.  Other  Factors  Related  to  Questionnaire  Administration  ....  191 

XI.  PRETESTING  OF  QUESTIONNAIRES  . 193 

A.  Overvisw  . . 193 

B.  Guidelines  for  Pretesting  Questionnaires  .  194 

XII.  CHARACTERISTICS  OF  RESPONDENTS  THAT  INFLUENCE  QUESTIONNAIRE 

RESULTS  . . 197 

A.  Overview . 197 

B.  Social  Desirability  and  Acquiescence  Response  Sets  .  198 

C.  Other  Response  Sets  or  Errors . . 200 

D.  Effects  of  General  Attitudes  of  Respondents  .  203 

E.  Effects  of  Demographic  Characteristics  on  Responses  .  204 

XIII.  EVALUATING  QUESTIONNAIRE  RESULTS  .....  .  207 

A.  Overview . 207 

B.  Scoring  Questionnaire  Responses  .  208 

C.  Data  Analyses . . 210 

XIV.  INTERVIEW  CONSIDERi.TIONS . 211 

A.  Overview  .  . . 211 

B.  Structured  and  Unstinictured  Interviews . 212 

C.  Interviewer's  Characteristics  Relative  to  Interviewee  ....  213 

D.  Situational  Factors . 215 

E.  Training  Interviewers . 217 

F.  Data  Recording  and  Reduction . 218 

G.  Special  Inteirviewer  Problems  . . .  .  219 

LIST  OF  TABLES 

Table  VIII-B-1.  Words  considered  unratable  by  subjects  .  119 

VIII-B-2.  Words  evoking  blmodality  of  response  .  120 

VIII>B*3.  Sample  list  of  phrases  denoting  degrees 

of  acceptability . 122 


xi 


CONTENTS  (Continued) 


Page 

Table  VIII -B-A.  A  second  sample  list  of  phrases  denoting 

degrees  of  acceptability . .  .  122 

VIII 'B- 5.  Candidate  midpoint  terms'  scale  values  and  standard 

deviations  as  determined  by  several  different  studies  124 

VIII -C-1  Degrees  of  frequency . .  .  132 

VIII -D-1.  Order  of  merit  of  selected  descriptive  terms  .  133 

ViIl-D-2.  Order  of  merit  of  descriptive  terms  using 

"use*  as  a  descriptor  . 134 

VIII'E'l.  Acceptability  phrases  .  136 

VIII-E-2.  Degrees  of  excellence:  First  set  . .  .  137 

VIII -E> 3.  Degrees  of  excellence:  Second  set  . .  138 

VIII>E*4.  Degrees  of  like  and  dislike . . .  139 

VIII-E*S.  Degrees  of  good  and  poor . . . 140 

VIII 'E-S.  Degrees  of  good  and  bad . . . 141 

VIII -E* 7.  Degrees  of  agree  and  disagree  .  .  142 

VIII*E*8.  Degrees  of  more  and  less  .  ...  i  ..  i  .  . . 143 

VIII*E>9.  Degrees  of  adequate  and  inadequate  ^  .  .  .  ’. . 144 

VIII 'E* 10.  Degrees  of  acceptable  and  unacceptable  . .  145 

VIII-E-11.  Comparison  phrases . .  . . 147 

VIII'E-12.  Degrees  of  satisfactory  and  unsatisfactory  .  148 

VIII-E-13,  Degrees  of  unsatisfactory  .  i  .  .  . . 148 

VIII-E-14.  Degrees  of  pleasant . . . .  149 

VIIl-E-15.  Degrees  of  agreeable  . . 149 

VlII-E-16.  Degrees  of  desirable  . . 150 

VIII-E-17.  Degrees  of  nice . .  . . 150 

VIII-E'18.  Degrees  of  adequate . 'i  ..  . .  151 


Page 

Table  VIIl-E-19.  Degrees  of  ordinary  . . .  151 

VIII-E-20.  Degrees  of  average  .......  .  .  .  .  152 

V1II*E>21.  Degrees  of  hesitation  .  152 

VIIl-E-22.  Degrees  of  inferior  ....  .  ....  .  .  ...  .  153 

VIII-E-23.  Degrees  of  poor  . . •  153 

VIII-E-24.  Descriptive  phrases  .  •  154 

VIII>F*1.  Sets  of  response  alternatives  selected  so  phrases 
are  at  least  one  standard  deviation  apart  and 
have  parallel  wording . . . .  157 

VIII*F>2.  Sets  of  response  alternatives  selected  so  that 
intervals  between  phrases  are  as  nearly  equal 
as  possible . .  .  .  .  .  159 

VIII*F>3.  Sets  of  response  alternatives  selected  from  lists 

giving  scale  values  only  . . •  •  •  •  161 

VIII -F-4.  Sets  of  response  alternatives  selected  using 

order  of  merit  lists  of  descriptor  terms  .  .  .  .  .  .  .  162 


LIST  OF  FIGURES 


Figure  IV-B*1.  Examples  of  open-ended  items  . .  24 

lV-C-1.  Examples  of  multiple  choice  items  .  ......  29 

IV-D-1.  Examples  of  nvimerical  rating  scale  items  .  32 

IV-D-2.  Example  of  graphic  rating  scale  item  .  ...  33 

IV-D-3.  Examples  of  discrete  and  continuous  scales  used 

to  rate  perception  of  tones  . . .  34 

IV-E-1.  Example  of  BARS's  seven  dimensions  describing 

technician  behavior  ........  .  ....  39 

IV-E-2.  Example  of  BARS  items  representing  performance  and 

effort  on  the  Job  . . .  .  .  ,  ,  40 

IV-E-3.  Example  of  BOS  item  representing  description 

of  foreman's  Job . . . .  41 


xiii 


Page 

Figure  IV-E-A.  Example  of  MSS  items  representing  highway  patrol 

stopping  vehicles  for  violations  .  41 

IV-F-1.  Examples  of  ranking  items  .  ....  45 

IV-G-1.  Examples  of  forced  choice  items  .  48 

IV-I-1.  Examples  of  semantic  differential  items  .  53 

IV-J-1.  Examples  of  checklists  .  55 

IV-J-2.  Example  of  checklist  pertaining  to  equipment  problems  .  .  56 

IV- J- 3.  Examples  of  formats  providing  for  supplementary 

responses .  58 

VI-C-1.  Example  of  question  form  and  incomplete  statement 

form  of  stem .  76 

VI-C-2.  An  insufficiently  detailed  question  stem,  plus 

revision . 78 

VI -C- 3.  Examples  of  loaded  questions .  81 

VI -€-4.  Examples  of  leading  questions  .  82 

VI -C- 5.  Example  of  a  threatening  question  .  83 

Vl-C-6.  Example  of  a  question  asking  the  respondent 

to  criticize .  8A 

VI-C-7.  Examples  of  compound  questions  and  alternatives .  85 

VI-C-8.  Example  of  ambiguous  question  and  alternative .  86 

VI-C-9.  Example  of  ambiguity  of  wording  .  87 

VI-C-10.  Alternate  ways  of  expressing  directionality 

and  intensity .  89 

VI-D-1.  Example  of  hard  to  understand  item  and  alternative  ...  91 

VI-F-1.  Example  of  Bradley  Fighting  Vehicle  Questionnaire 

for  multiple  groups .  96 

VI-H-1.  Example  of  rating  scale  item  with  alternate 

ordering  of  response  alternatives  .  104 


xiv 


Page 


Figure  VII-B-1.  Types  o£  response  anchors  ...  .  107 

VII-F-1.  Examples  of  scale  balance,  midpoints,  and  polarity  ...  113 

VIII-B-1.  Inclusion  of  the  "Don’t  Knov"  response  alternative 

for  a  maintenance  vehicle  questionnaire  .  128 

VIII-B-2.  Tvo  formats  using  "outstanding"  and  "superior"  ....  130 

VIII -B-3.  Response  alternatives  frequently  recommended  by  ARI  .  .  131 

IX-B-1.  Arrangement  of  items  with  same  rating  scale  response 

alternatives . 165 

IX-D-1.  Original  questionnaire  format  and  modified 

questionnaire  format  .  170 

Z-C-1.  An  example  of  a  Privacy  Act  statement  .  182 


XV 


I-A  Page  1 
8  Kar  85 

QUESTIONNAIRE  CONSTRUCTION  MANUAL  (s.  1  Jul  76) 


Chapter  I:  Introduction 
A.  Purpose  and  Organization  of  This  Manual 

1.  Purpose 

This  Manual  has  been  prepared  primarily  for  the  use  and  guidance  of 
those  who  are  tasked  to  develop  and/or  adninister  questionnaires  as 
part  of  Army  field  tests  and  evaluations,  such  as  those  conducted 
by  the  TRAOOC  Combined  Arms  Test  Activity  (TCATA),  the  Combat  De¬ 
velopments  Experimentation  Command  4 CDEC),  the  Operational  Test  and 
Evaluation  Agency  (OTEA),  and  the  several  Army  Boards  and  Schools. 
The  general  content  and  concepts,  however,  are  applicable  to  a 
variety  of  situations.  As  such,  the  manual  should  prove  useful  to 
all  Individuals  Involved  In  the  construction  and  administration  of 
surveys.  Interviews  or  questionnaires. 

2.  Organization 

Information  and  guidance  relating  to  the  preparation  of  Items  for 
questionnaires  and  for  their  assembly  and  arrangement  Into  a  com¬ 
plete  questionnaire  are  presented  In  Chapters  II  through  X.  Chap¬ 
ter  XI  discusses  the  Importance  of,  and  procedures  for,  pretesting 
questionnaires  prior  to  their  regular  administration.  Chapter  XII 
discusses  characteristics  of  respondents  that  Influence  question¬ 
naire  results.  The  analysis  and  evaluation  of  responses  to  a 
-  questionnaire  are  briefly  dealt  with  In  Chapter  XIII.  Finally,  a 
number  of  considerations  regarding  the  presentation  of  questions  by 
means  of  an  Interview  are  discussed  In  Chapter  XIV. 


1 


I-B  Page  1 
8  Mar  85 
Cs.  1  Jul  76) 


8.  Definition  of  Questionnaire 

As  used  In  this  nanual,  the  word  "questionnaire"  refers  to  an  ordered 
arrangement  of  Items  (questions.  In  effect)  Intended  to  elicit  the 
evaluations.  Judgments,  comparisons,  attitudes,  beliefs,  or  opinions  of 
personnel.  The  content  and  format  of  the  Items  may  vary  widely.  A 
visual  mode  of  presenting  the  Items  Is  employed.  In  the  past,  this 
meant  that  the  Items  were  typed  or  printed  on  paper,  but  now  Items  can 
also  be  presented  by  closed  circuit  television  or  on  a  cathode  ray  tube 
(CRT)  or  on  a  video  display  terminal  (YOT)  under  the  control  of  a  com¬ 
puter  program.  If  the  Items  are  first  read  by  an  Interviewer  and  then 
given  verbally  to  the  respondent,  the  questionnaire  may  a>o  be  termed 
a  "structured  Interview."  Hence,  questionnaires  and  Interviews  have 
some  common  properties.  Questionnaire  Items  used  to  be  responded  to  by 
scribing  words  or  marks  with  a  pen  or  pencil,  but  this  aspect  too  has 
been  enlarged  to  Include  typed,  punched,  button-pushing,  light-penned. 
Joystick,  and  verbal  responses. 

While  questionnaires  are  "data  collection  forms,"  not  all  data  collec¬ 
tion  forms  are  questionnaires.  Those  forms  used  by  personnel  to  enter 
Instrument  readings  or  to  record  their  counts  or  observations  (e.g., 
time  of  first  detection,  number  of  targets  correctly  Identified,  number 
of  rounds  fired)  are  not  directly  addressed  In  this  manual. 


I-C  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


C.  Conventions  Used  In  This  Manual 

1.  Identification  Scheme  Used 

This  aanual  has  been  prepared  In  outline  fora  to  facilitate  cross- 
referencing  and  later  updating.  The  Identification  scheme  that  Is 
used  employs  Roman  numerals*  capital  and  small  letters,  and  numbers 
In  the  sequence:  I  A  1  a  (1)  (a)  Cl]  [a].  The  major  divisions,  I, 
II,  III,  IV,  etc.,  are  called  chapters.  All  other  subdivisions  are 
called  "sections,"  with  sections  starting  with  capital  letters  (A, 
8,  etc.)  called  "major  sections."  You  are  now,  for  example,  read¬ 
ing  Section  I-C  1.  To  facilitate  later  updating,  references  within 
the  manual  are  to  sections  and  not  pages. 

2.  Pagination 

Each  major  section  of  this  manual  (e.g.,  I-C)  starts  on  a  new  page, 
and  pages  are  numbered  within  each  major  section.  For  example, 
this  Is  Section  I-C  Page  1,  or  the  first  page  of  Section  I-C. 

3.  Page  Update  Date 

Immediately  under  each  page  number  Is  the  date  that  the  page  was 
drafted  or  revised.  Uhen  a  page  has  been  revised,  the  date  of  the 
lonedlately  previous  version  Is  also  given  In  parentheses  with  the 
letter  "s"  meaning  superseded."  For  example,  III-B  Page  1  dated  1 
Jul  76  was  revised  on  8  Mar  85.  The  page  number  on  the  revised 
page  would  appear  as: 

III-B  Page  1 
8  Mar  85 
(s.  1  Jul  76) 

When  updating  the  manual,  new  material  that  was  not  previously  part 
of  the  text  would  not  require  the  letter  "s,"  For  example,  IV-E 
Page  6  originated  on  8  Mar  85  would  appear  as: 

lY-E  Page  6 
8  Mar  85 

4.  Table  and  Figure  Identification 

Both  tables  and  figures  are  numbered  sequentially  within  a  major 
section,  with  a  hyphen  before  the  table  or  figure  number.  Examples 
are:  Table  VIII-B-1,  Table  VIII-B-2,  Figure  VI-A-l. 


3 


#  I-D  Page  1 

t  1  Jul  76 

D.  Keeping  This  Manual  Up  to  Data 

1.  Updated  Pages  Should  be  Inserted  as  Received 

It  is  anticipated  that  sections  of  this  manual  will  be  periodically 
corrected,  revised,  or  otherwise  updated.  New  pages  should  be 
Inserted  as  soon  as  they  are  received.  This  will  not  only  keep  the 
manual  up  to  date,  but  will  facilitate  adding  pages  received  at  an 
even  later  date.  Appropriate  instructions  covering  which  pages  to 
add  and  delete  will  accompany  distributed  update  pages.  When  it 
appears  useful,  a  list  will  also  be  provided  showing  the  page 
numbers  and  dates  of  all  pages  that  should  be  in  the  manual  at  that 
time. 

2.  Request  for  Updates 

To  be  placed  on  the  distribution  list  to  receive  updates  to  this 
manual,  write  to: 


Chief 

ARI  Field  Unit-Fort  Hood 
HQ  TCATA  (PERI-OH) 

Fort  Hood,  Texas  76544-5065 


4 


I-E  Page  1 
1  Jul  76 


E,  Reporting  Problems  and  Suggestions  for  Improvement 

As  previously  noted,  it  is  anticipated  that  this  «anual  will  periodi¬ 
cally  be  updated  to  improve  its  utility.  To  report  errors,  problems, 
or  suggestions,  write  to: 

Chief 

ARr Field  Unit-Fort  Hood 
HQ  TCATA  (PERI-OH) 

Fort  Hood,  Texai  76544-5065 


5 


II-A  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


Chapter  II:  Major  Questionnaire  Types  and  Administration  Procedures 
A.  Overview 


This  chapter  briefly  summarizes  the  different  types  of  questionnaires 
discussed  in  this  manual  (Section  II-B)  and  ways  that  questionnaires 
may  be  administered  (Section  II-C).  Detailed  guidelines  regarding 
what  to  do  in  a  given  situation  are  included  in  subsequent  chapters. 
Issues  to  consider  when  deciding  whether  to  use  a  structured  interview 
or  some  other  type  of  questionnaire  are  presented  in  Section  II-D, 
which  also  notes  that  combinations  of  methods  may  be  employed.  It  is 
concluded  that  both  structured  interviews  and  other  types  of  question¬ 
naires  have  their  place.  Each  has  strengths  and  limitations  which  must 
be  taken  into  account  when  identifying  which  instruments  to  use. 


7 


Preceding  Page  Blank 


II-B  Page  1 
8  Har  85 
(s.  1  Jul  76) 


B.  Types  of  Questionnaires  Discussed  fn  TMs  Manual 

There  are  a  number  of  techniques  of  data  collectlen  that  can  be  used  to 
measure  human  attributes,  attitudes,  opinions,  and  behavior.  Attitude 
and  opinion  are  closely  aligned  If  not  overlapping.  Opinions  are 
restricted  to  verbalized  attitudes.  Attitudes  are  sometimes  uncon¬ 
scious  or  nonverballzed.  Some  of  the  methods  of  data  collection  are 
observation,  personal  and  public  records,  specific  performances,  soclo- 
metry.  Interviews,  questionnaires,  rating  scales,  pictorial  techniques, 
projective  techniques,  achievement  testing,  and  psychological  testing. 
For  this  manual,  however,  attention  has  been  restricted  to  a  more 
limited  number  of  data  collection  techniques:  certain  paper-and-pencll 
types  of  Instrianents  broadly  classed  as  questionnaires  as  defined  In 
Section  l-A  2.  and  Including  only  some  of  the  teclmlques  mentioned 
above.  A  distinction  has  also  been  made  In  this  manual  between  open- 
ended  questionnaire  Items  and  closed-end  Items.  Open-ended  Items  are 
those  which  permit  respondents  to  express  their  opinions  In  their  own 
words,  and  to  Indicate  any  qualifications  they  wish.  The  amount  of 
freedom  the  respondent  will  be  given  In  expressing  an  answer  to  an 
open-ended  Item  Is  partly  determined  by  the  questionnaire  designer. 
Closed-end  Items  use  resi«onse  alternatives.  Respondents  are  directed 
to  select  one  or  more  of  the  response  alternatives  from  a  closed  set. 
Closed-end  Items  frequently  used  are  multiple  choice,  true-false, 
checklist,  rating  scale,  and  forced-choice.  Survey  Items  have  been 
roughly  classified  Into  two  groups:  open-ended  Items  and  closed-end 
Items. 

It  Is  common  to  use  Interview  surveys  to  ask  questions  and  record 
answers.  Structured  Interviews  are  Included  within  the  definition  of 
questionnaires  used,  since  typically  an  Interview  form  Is  developed  and 
used  by  an  Interviewer  both  for  asking  questions  and  recording  re¬ 
sponses.  much  like  a  self-administered  questionnaire.  On  the  other 
hand,  the  unstructured  Interview  makes  no  use  of  structured  data  col¬ 
lection  forms.  The  Interviewers  are  permitted  to  discuss  the  subject 
matter  as  they  see  fit  with  no  particular  order  or  sequence.  Of 
course,  other  Interviews  fall  somewhere  between  these  two  extremes.  In 
any  case,  unstructured  Interviews,  where  no  structured  response  forms 
are  used,  are  not  Included  within  the  definition  of  questionnaires  used 
In  this  manual. 


3 


II-C  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


Ways  That  Questionnaires  Can  Be  Administered 


There  are  a  number  of  respects  In  which  questionnaire  administrations 
may  vary.  However,  in  the  usual  field  test  settings,  the  typical  ques¬ 
tionnaire  administration  situation  Involves  paper-and-pencll  materials 
with  the  author/test  officer  administering  the  questionnaire  face-to- 
face  with  a  group  of  test  players  or  evaluators. 


1.  Group  Versus  Individual  Administration 


Given  a  printed  questionnaire,  calendar  time  Is  saved  by  group 
administration.  Group  administration  allows  the  opportunity  for  a 
questionnaire  administrator  to  explain  the  survey  and  answer  ques¬ 
tions  about  Items.  The  task  of  statistical  analysis  can  be  Ini¬ 
tiated  with  less  delay  than  If  one  were  waiting  on  a  series  of 
Individual  administrations.  An  1a^)ortant  determinant  of  group  vs. 
Individual  Is  the  time  at  which  people  complete  their  participation 
In  the  test.  Host  often  all  participants  are  through  at  the  same 
time.  All  would  be  available  for  questionnaire  administration  as 
soon  as  they  could  be  brought  to  an  appropriate  place  or  places. 
Prompt  group  administration  gives  the  same  short  amount  of  time  for 
forgetting  about  test  events  by  those  who  become  the  respondents. 
Group  administration  generally  has  a  high  cooperation  rate.  If 
there  Is  an  administrator,  his/her  time  Is  conserved  directly  In 
proportion  to  the  number  of  respondents  he/she  has  In  each  adminis¬ 
trative  session.  An  advantage  of  group  administration  Is  low  cost. 


Author-Administered  Questionnaires 


When  the  test  officer  or  administrator  who  Is  familiar  with  the 
content  of  the  questionnaire  and  ti)e  test's  purposes/ objectives  can 
administer  the  questionnaire,  some  advantages  can  be  gained.  The 
administrator's  Instructions  and  appeals  may  Increase  the  number  of 
respondents  having  desirable  motivation  to  complete  the  question¬ 
naire  by  giving  appropriate  consideration  to  each  Item.  If  one  em¬ 
ploys  a  self-administration  procedure,  such  as  might  occur  In  a 
mailed-out  questionnaire,  or  If  a  poorly  prepared  stand-in  plays 
the  role  of  administrator,  then  the  respondents  must  derive  their 
Instructions  and  some  of  their  motivation  from  printed  Instructions 
(or  from  the  poorly  prepared  stand-in).  More  things  usually  can 
end  up  going  wrong  when  questionnaires  are  self-administered  than 
when  they  are  administered  by  a  test  administrator. 


3.  Remote  Administrations 

e- 

From  the  test  officers*  point  of  view,  remote  administration  refers 
to  a  questionnaire  administration  event  that  they  cannot  conduct 
because  of  Its  distance  from  them  and/or  other  demands  on  their 
time.  This  dimension,  remote  versus  face-to-face.  Is  similar  but 
not  Identical  to  the  previously  noted  dimension,  self-administered 
versus  author  administered. 


9 


II-C  Page  2 
8  Mar  85 
(s.  I  Jul  76) 

To  avoid  the  possible  disadvantages  of  se1f*adm1n1stered  question¬ 
naires,  the  test  officer  must  be  able  to  afford  another  adminis¬ 
trator,  train  him/her  In  the  knowledge  and  skills  associated  with 
effective  administration,  and  transport  him/her  to  the  "remote" 
administration  location.  If  multiple  administrations  having  loca¬ 
tion  or  timing  differences  which  preclude  the  same  administrator 
from  handling  them  are  required.  It  would  appear  that  the  chances 
are  Increased  that  more  respondents  will  experience  more  "difficul¬ 
ties"  In  answering  the  questions.  For  this  type  of  questionnaire 
administration,  the  questionnaire  Itself  would  require  careful 
design  associated  with  Items  and  Instructions. 

4.  Other  Materiel  Modes 

Providing  the  respondents  with  a  printed  questionnaire  form,  and  a 
pencil  to  mark/write  their  responses.  Is  the  most  common  question¬ 
naire  administration  procedure  In  field  evaluations.  In  addition, 
other  presentation  modes  have  been  used.  In  a  card-sorting  proce¬ 
dure  that  has  been  used  with  Individuals  and  groups,  each  respon¬ 
dent  reads  statements  of  candidate  problems  and  then  places  the 
card  Into  the  appropriate  pile  according  to  his/her  Judgment  of  the 
severity  of  the  problem."  Rarer  because  of  expense  and  logistics 
problems  Is  the  setting  up  of  a  computer  terminal  where  each  re¬ 
spondent  enters  (types  In)  answers  to  questions  that  are  displayed 
on  a  cathode  ray  tube  (or  other  computer  display  device).  Chapter 
XII  presents  many  other  considerations  related  to  questionnaire 
administration. 


10 


/ 


II-D  Page  1 
8  Mar  85 
(s.  1  Jul  76) 

D.  Struc't’jred  Interviews  Versus  Other  Types  of  Questionnaires 
1.  Issues  to  Consider 

When  deciding  whether  to  use  a  structured  Interview  or  another  type 
of  questionnaire,  a  number  o‘f  Issues  should  be  considered. 

Included  are  the  following: 

a.  To  develop  questionnaire  Items,  a  focus  group  may  be  Inter¬ 
viewed.  Their  comments  can  be  used  to  develop  hypotheses  and 
refine  questions.  This  Information  can  be  adapted  to  an  Inter¬ 
view  guide  and  Interview  Items. 

b.  Interview  Items  should  not  use  a  dichotomous  response  set. 
Multiple  choice  and  open-ended  questions  provide  the  oppor¬ 
tunity  for  probing. 

c.  If  a  structured  Interview  Is  used,  there  must  be  enough  quali¬ 
fied  Interviewers  to  expeditiously  process  all  Interviewees. 
Sometimes  there  are  only  a  few  personnel  to  be  Interviewed,  or 
there  Is  plenty  of  time  available  for  Interviews,  so  only  one 
or  two  Interviewers  will  be  necessary.  In  other  situations, 
maybe  only  an  hour  or  so  may  be  available  per  Interviewee;  In 
these  cases,  a  large  number  of  qualified  Interviewers  must  be 
available. 

d.  Face-to-face  Interviews  have  a  higher  response  rate  than  mail 
surveys. 

e.  In  most  cases,  respondents  have  a  greater  tendency  to  answer 
open-ended  questions  In  an  Interview  than  when  response  Is  by 
paper  and  pencil. 

f.  It  Is  possible  to  adapt  face-to-face  Interview  guides  for 
telephone  surveys.  Oral  labeling  of  the  scale  points  should  be 
assessed  on  a  pilot  survey  to  be  sure  that  the  responses  are 
not  biased  by  the  oral  presentation  of  the  scale. 

g.  Telephone  Interviews  are  faster  to  perform  than  mall  surveys. 

h.  Interviews  conducted  by  telephone  require  an  Interview  struc¬ 
ture  that  promotes  a  high  Interaction  between  the  Interviewer 
and  respondent. 

1.  Group-administered  paper-and-pencll  questionnaires  may  be  less 
expensive,  more  anonymous,  and  completed  faster  than  the  same 
number  of  Interviews. 

j.  Respondents  seem  to  be  less  likely  to  report  unfavorable  things 
In  an  Interview  than  In  an  anonymous  questionnaire.  Typically, 
questionnaires  are  also  more  likely  than  Interviews  to  produce 
self-revealing  data. 


11 


n-0  Page  2 
8  Har  85 
(s.  1  Jul  76) 

k.  Issues  Involving  socially  acceptable  or  unacceptable  attitudes 
and  behaviors  will  elicit  nore  response  bias. 

l.  During  Interviews,  respondents  often  have  a  tendency  to  try  to 
support  the  norms  that  they  assume  the  Interviewer  adheres  to. 

m.  Interviewers  with  biases  on  the  Issues  under  discussion  may 
reflect  them  In  the  content  they  record,  as  well  as  In  what 
they  fall  to  record. 

n.  Ethnic  background  differences  between  Interviewer  and  respon¬ 
dent  probably  will  not  Influence  the  survey  results  unless  the 
Items  have  a  racial  content  or  are  found  to  be  threatening. 

0.  Although  a  structured  Interview  using  open-ended  questions  may 
produce  more  complete  Information  than  a  typical  questionnaire 
containing  the  same  questions,  empirical  research  seems  to 
Indicate  that  responses  to  the  typical  questionnaire  are  more 
reliable;  I.e.,  more  consistent.  Structured  Interviews  using 
closed-end  questions  appear  to  be  as  reliable  is  paper-and- 
pencll  questionnaires. 

р.  It  may  be  difficult  to  code  a  combination  of  open-ended  and 
closed-end  Items  for  Interview  surveys.  (See  Section  XIII-B, 
Scoring  Questionnaire  Responses.) 

2.  Combinations  of  Methods 

There  are  some  situations  where  a  combination  of  methods  of  qu. " 
tioning  might  be  used: 

a.  An  Interview  might  be  used  to  obtain  Information  for  designing 
a  paper-and-pencll  questionnaire. 

b.  Personal  Interviews  or  telephone  Interviews  might  be  used  for 
respondents  who  do  not  return  questionnaires  administered 
remotely  (such  as  mall  questionnaires). 

с.  When  respondents  are  unable  to  give  complete  Information  during 
an  Interview,  they  can  be  left  a  copy  of  a  questionnaire  to 
com,ilete  and  mall  In,  so  that  the  necessity  for  a  call-back  Is 
eliminated. 

3.  Conclusion 

Both  structured  Interviews  and  other  types  of  questionnaires  appear 
to  have  their  advantages  and  disadvantages.  The  choice  of  which  to 
use  may  well  depend  upon  costs,  which  are  generally  lower  for  the 
typical  questionnaire.  The  typical  questionnaire  Is  apparently 
more  reliable,  while  the  structured  Interview  may  provide  more 
unique  and  more  abundant  Information.  If  the  dimensions  of  a 
problem  have  not  been  explored  before,  the  best  compromise  would 
appear  n  be  to  use  the  Interview  approach  with  open-ended  Items  to 
uncover  the  dimensions,  and  follow  this  by  the  use  of  the  paper- 
and-pencll  questionnaire  with  closed-end  Items  to  obtain  more 
specific  Information. 


12 


III-A  Page 
1  Jul  76 


Chapter  III:  Content  of  Questionnaire  Items 

A.  Overview 


The  recownended  general  steps  In  preparing  a  questionnaire  Include 
preliminary  planning,  determining  the  content  of  questionnaire  Items, 
selecting  question  forms,  wording  of  questions,  formulating  the  ques¬ 
tionnaire,  and  pretesting.  As  part  of  preliminary  planning,  the  In¬ 
formation  required  has  to  be  determined,  as  do  procedures  required  for 
administration,  sample  site,  location,  frequency  of  administration, 
experimental  design  of  the  field  test,  and  analyses  to  be  used.  Se¬ 
lecting  question  forms  Is  a  function  of  the  content  of  the  question¬ 
naire  Items  and  requires  knowledge  of  types  of  questionnaire  Items  and 
scaling  techniques.  The  wording  of  questions  Is  the  most  critical  and 
most  difficult  step.  Formulating  the  questionnaire  Includes  format¬ 
ting,  sequencing  of  questions,  consideration  of  data  reduction  and 
analysis  techniques,  determining  basic  data  needed,  and  Insuring  ade¬ 
quate  coverage  of  required  field  test  data.  Pretesting  Involves  using 
a  small  but  representative  group  to  Insure  that  all  questions  are 
understandable  and  unambiguous. 

This  chapter  considers  the  content  of  questionnaire  Items.  Methods  for 
determining  questionnaire  content  are  discussed  first,  and  then  other 
considerations  related  to  questionnaire  content  are  presented.  The 
other  steps  noted  above  are  discussed  In  subsequent  chapters. 


13 


!lI-3  Page  1 
8  Har  85 
(s.  I  Jul  76) 


B.  Determining  Questionnaire  Content  Preliminary  Research 
1.  Preliminary  Research 


If  you  have  the  Job  of  developing  a  questionnaire  for  a  field  test, 
there  are  several  things  that  should  be  done  before  starting  to 
write  questionnaire  Items. 

t.  Learn  the  test's  objectives  and  Issues.  Read  the  Outline  Test 
^lan  ^n  order  to  learn  what  it  says  the  test’s  purpose,  scope, 
and  objectives  are.  All  data  collection  effort.  Including 
questionnaire  administration,  should  be  consistent  with  and 
supportive  of  the  test’s  objectives.  Read  the  Independent 
Evaluation  Plan,  with  Its  discussion  of  Issues  and  of  ways  of 
collecting  data  on  the  Issues. 

b.  Hhat  performance  measures  are  planned  for  the  test?  One  may  be 
fortunate  enough  to  be  Involved  with  a  test  for  which  the 
Detailed  Test  Plan  has  to  a  large  extent  been  written.  Try  to 
discover  what  performance  measures/data  are  to  be  collected. 

If  performance  data  Is  to  be  collected  oa  some  aspects  of  the 
functioning  of  the  system  to  be  tested,  then  It  may  not.be 
necessary  to  assess  these  functions  via  questionnaire  Items. 
Hake  a  list  of  what  should  be  measured  to  meet  the  objectives 
of  the  field  test.  The  list  will  Include  variables  that  are 
configured  Into  categories.  The  list  should  not  Include  any 
questions. 

c.  Consult  others  and  prior  test  plans  and  reports.  Many  tests  at 
CbLci  and  YcATa  (and  elsewhere)  foiiow-up,  or  are  similar  to, 
prior  testing.  As  a  consequence.  Information  may  be  readily 
available  regarding  prior  related  or  similar  tests.  Test  files 
or  the  Technical  Information  Center  may  ^^Ide  a  source  for 
obtaining  test  plans  and  imports  on  relevant  prior  tests  con¬ 
ducted  by  Army  field  tesit/experlmentatloa  agencies. 

d.  Consult  others  and  devellp  an  analysis  plan.  The  Technical 
Information  Center  may  provide  guidance  for  data  analysis. 
Develop  an  analysis  plan;  with  a  list  of  variables  to  be  mea¬ 
sured.  The  analysis  plan  Identifies  dependent  and  Independent 
variables.  It  also  Identifies  which  variables  to  control  and 
any  intervening  variables. 

Preliminary  research  requires  an  understanding  of  the  objectives  of 
the  test  plan,  a  list  of  the  variables  to  be  measured,  and  a  plan 
for  analysis  of  the  data. 


Ill-B  Page  2 
8  Mar  85 
{$.1  Jul  76) 

Using  Interviews  to  Determine  Questionnaire  Content 

If  one's  degree  of  experience  seems  meager  relative  to  the  com¬ 
plexities  of  the  evaluation  problem,  he/she  may  employ  group  and/or 
Individual  Interviews  to  assist  In  determining  questionnaire  con¬ 
tent.  Preferably,  this  would  be  done  after  talcing  the  steps  noted 
above.  The  less  one  knows  about  a  subject,  the  less  structure  one 
can  Impose  on  an  Interview  dealing  with  the  subject. 

a.  Conducting  an  unstructured  group  Interview.  Personnel  are 
needed  who  have  relevant  operating  experience  with  the  system 
to  be  tested/evaluated  -  or  with  a  sufficiently  similar  system. 
Arrange  a  coninon  meeting  place  and  time  with  about  five  to 
ten  of  them.  It  would  be  advantageous  to  have  a  meeting  place 
that  was  not  cramped  for  space,  had  comfortable  chairs,  a 
comfortable  temperature,  and  where  all  discussants  were  free 
from  other  sources  of  distraction  (sights  and  sounds,  mainly). 

If  the  Interviewer's  age  and  rank  are  several  steps  above  or 
below  the  age  and  rank  of  the  members  of  a  homogeneous  group  of 
discussants,  try  (before  the  meeting)  to  get  a  person  who  1$ 
their  contemporary  (peer)  In  age  and  rank  to  lead  and  coordi¬ 
nate  the  discussions.  Why?  Because  a  mismatch  may  Inhibit 
their  discussion  or  produce  too  much  submissive,  agreeing 
behavior  on  .their  part. 

If  notes  are  being  taken  or  the  discussion  Is  being  tape  re¬ 
corded,  one  should  be  unobtrusive  about  It.  Don't  shove/point 
a  microphone  at  people  as  they  start  to  speak.  They  may  be 
Inhibited  by  this,  or  they  may  become  "hams." 

The  first  several  minutes  should  be  spent  In  establishing 
rapport  with  the  group.  The  purpose  of  the  session  should  be 
covered.  Introduction  of  group  members  made,  and  other  warm-up 
devices  used.  The  objective  Is  to  motivate  as  many  respondents 
to  give  comments  as  possible.  In  the  remainder  of  the  session, 
any  or  all  of  the  following  Information-eliciting  devices  could 
be  used: 

(1)  Discuss  samples  of  the  control  item— ask  the  general 
question:  "What  problems  have  you  had  with  this  piece,  of 
equipment  or  system?"  Follow  up  with  who,  what,  where, 
when  and  why.  Attempt  to  maximize  the  number  of  potential 
or  actual  problems  posed.  Strive  for  clarification  of 
problem  Ideas,  but  do  not  criticize  the  comments,  even  If 
they  are  redundant  with  a  previous  contribution  by  the 
respondent  or  other  respondents. 

(2)  Ask:  "What  do  you  consider  to  be  the  most  Important 
features  (characteristics,  qualities,  etc.)  of  this  equip¬ 
ment  or  system  when  used  In  the  field?"  Strive  to  get  a 
multitude  of  adjectives  and  phrases  here  (e.g.,  ease  of 
operation,  weight,  durability,  portability,  etc.). 


15 


Ill-a  Page  3 
8  Mar  85 
(s.  1  Jul  76) 


(3)  Use  the  aided  recall  technique:  "Can  you  remember  where 
and  when  you  have  encountered  problems  with  this  system?" 
(e.g.,  at  night;  when  It's  damp,  etc.). 

(4)  The  way  survey  Issues  are  discussed  will  help  In  selecting 
vocabulary  and  phrasing  questions. 

(5)  Researchers  Interested  In  obtaining  accurate  data  from 
their  Interviews  generally  ask  multiple  questions  for  each 
topic.  The  questions  are  sequenced  to  provide  smooth 
transitions  throughout  the  Interview.  Development  of 
questionnaire  Items  Is  based  on  hypotheses  that  have  been 
developed.  The  hypotheses  are  presented  to  a  group  of 
Individuals  who  are  subject  matter  experts,  and  they 
perform  a  preliminary  assessment  of  the  hypotheses.  The 
questionnaire  may  require  modulation  If  the  hypotheses 
are  not  viable. 

The  recorded  comnents  should  be  ategorized  and  arranged  by 
frequency.  For  example,  how  many  of  the  comments  on  system 
operation  stressed  failure  considerations? 

b.  Conduct  semi  structured  personal  Interviews.  Information  pro- 
duced  f’roro  the  unstructured  group  Iniervlews  provides  general 
guidance  ^  the  specific  evaluative  Information  desired.  As  a 
next  step,*  or  as  an  alternative  step  to  the  group  Interview, 
one  may  employ  a  small  number  of  representative  respondents  In 
a  person-to-person  Interview  format. 

In  this  method  of  Interviewing,  the  Interviewers  are  given  only 
general  Instructions  on  the  type  of  Information  desired.  They 
are  left  free  to  ask  the  necessary  direct  questions  to  obtain 
this  Information,  using  the  wording  and  the  order  that  seems 
most  appropriate  In  the  context  of  each  Interview.  These 
Interviews,  like  the  unstructured  group  sessions,  are  useful  In 
obtaining  a  clearer  understanding  of  problems,  and  In  deter¬ 
mining  what  areas  (evaluation  criteria)  should  be  Included  on 
the  pilot  questionnaire. 

The  only  structure  to  the  semi  structured  Interview  comes  from  a 
set  of  question  categories  that  must  be  raised  sometime  during 
the  Interview.  Questions  on  system  experience,  positive  and 
negative  features,  and  problems  In  field  use,  for  example,  can 
be  phrased  In  any  manner  or  sequena.  Probing  questions  of  the 
type:  "Why  do  you  feel  that  way?,"  "What  dn  you  mean  by  that 
statement?,"  and  "What  other  reasons  do  you  nave?"  an  be 
utilized  until  the  Interviewers  are  satisfied  that  they  have 
the  neassary  Information  considering  time  limitations,  data 
requirements,  and  the  willingness  and  ability  of  the  respon¬ 
dents  to  verbalize  their  views.  Interview  forms  should  be 
designed  to  allow  the  Interviewer  sufficient  space  for  writing 
notes  and  comments. 


/ 


16 


III-B  Page  4 
8  Mar  85 
(s.  I  JuT  76) 

In  the  semi structured  Interview,  the  Interviewer  has  some  flex¬ 
ibility  In  formulating  and  asking  questions.  This  technique 
can,  therefore,  be  only  as  effective  In  obtaining  complete,  ob¬ 
jective,  and  unbiased  Information  as  the  Interviewer  Is  skilled 
In  formulating  and  asking  questions.  Thus,  Interviewers  may 
have  to  be  trained  In  using  this  technique. 

When  Interviews  are  used  as  the  basis  for  a  future  question¬ 
naire,  the  questions  need  to  be  carefully  stated  so  that  they 
are  eliciting  data  which  will  enable  the  Interviewer  to  con¬ 
struct  questions  which  address  the  stated  objectives  and  Issues 
of  the  research.  Once  the  questionnaire  Items  have  been  Iden¬ 
tified,  the  Items  need  to  be  assembled  Into  a  logical  sequence. 
They  then  need  to  be  administered  to  a  sample  of  respondents 
who  have  a  background  similar  to  the  audience  to  which  the 
questionnaire  was  originally  targeted.  Information  obtained 
from  the  sample  administration  Is  used  to  refine  questionnaire 
Items. 

c.  Develop  the  questionnaire.  In  the  development  phase  of  a 
questionnaire,  an  open-ended  response  format  can  be  useful  In 
selecting  meaningful  response  alternatives  for  a  multiple 
choice  format.  Open-ended  questions  administered  to  a  sample 
of  the  target  population  will  provide  responses  that  can  then 
be  phrased  In  the  spontaneous  wording  of  the  Individuals  In  the 
sample.  The  questionnaire  Items  can  be  pretested  using  an 
open-ended  response  format  on  respondents  who  are  representa¬ 
tive  of  the  eventual  test  population.  Prior  to  pretesting  the 
open-ended  questions,  the  test  officer  needs  to  be  sensitive  to 
the  phrasing  of  the  questions  since  Inadvertent  phrasing  of  the 
open-ended  questions  can  sometimes  modify  responses  In  unrecog¬ 
nized  and  unintended  ways.  The  use  of  open-ended  response 
formats  and  Interviews  should  enable  the  formulation  of  a 
questionnaire  to  obtain  evaluative  Information.  These  Inter¬ 
views  will  provide  guidance  to  the  formulation  of  a  sound 
survey  Instrument  In  the  following  respects: 

(1)  A  better  understanding  of  the  factors  or  criteria  which 
make  up  the  mental  set  of  Individuals  In  evaluating  sys¬ 
tems  and  equipment. 

(2)  Some  Idea  of  the  range  of  favorable  and  unfavorable  opin¬ 
ions  toward  the  system  for  each  factor. 

(3)  Tentative  knowledge  of  Individual  and  group  differential 
opinions  toward  the  system  tested. 

Therefore,  before  drafting  the  pretest  questionnaire,  the  re¬ 
searcher  must  have  a  feel  for:  question  categories  (e.g., 
problem  areas,  positive  aspects);  response  categories  (e.g. , 
evaluative  factors);  and  the  type  of  system  operations  Informa¬ 
tion  which  Is  needed  (e.g..  In  evaluating  a  new  helmet  suspen¬ 
sion  system,  does  respondent  wear  eyeglasses?). 


17 


III-B  Page  5 
8  Mar  85 
(s.  1  Jul  76} 

3.  ^s^ng  the  Critical  Incident  Technique  to  Detenrtne  Questlonnafre 
Content 

The  critical  Incident  technique  consists  of  a  set  of  procedures  for 
collecting  direct  observations  of  hinan  behavior  In  such  a  way  as 
to  facilitate  their  potential  usefulness  either  In  solving  practl- 
cal  problems  or  In  developing  broad  psychological  principles.  The 
technique  calls  for  collecting  observed  Incidents  of  behavior  that 
have  special  significance  and  meet  systematically  defined  criteria. 
It  can  be  of  assistance,  therefore.  In  helping  to  determine  the 
content  of  Items  to  be  Included  In  a  questionnaire. 

Although  there  are  a  number  of  variations  In  the  critical  Incident 
technique,  the  basic  procedure  consists  of  collecting  records  of 
specific  behaviors  related  to  the  topic  of  concern.  The  behaviors 
might  be  noted  by  observers,  or  Individuals  can  be  asked  to  recall 
and  record  past  specific  behaviors  judged  to  provide  significant  or 
critical  evidence  related  to  the  topic  of  concern.  As  appropriate, 
behaviors  related  both  positively  and  negatively  to  the  area  of 
concern  should  be  noted.  The  records  of  behavior  that  are  col¬ 
lected  can  then  be  analyzed  and  used  as  a  basis  for  determining 
questionnaire  content. 

One  of  the  examples  of  the  use  of  the  critical  Incident  technique 
reported  by  Flat.,  gan  In  the  articles  noted  In  Section  III-B  3,  had 
to  do  with  a  study  of  combat  leadership  In  the  United  States  Army 
Air  Forces  In  1944.  It  represented  “the  first  large-scale,  sys¬ 
tematic  effort  to  gather  specific  Incidents  of  effective  or  In¬ 
effective  behavior  with  respect  to  a  designated  activity.  The 
Instructions  asked  the  combat  veterans  to  report  Incidents  observed 
by  them  that  Involved  behavior  which  was  esp^lally  helpful  or 
especially  Inadequate  In  accomplishing  the  assigned  mission.  The 
statement  finished  with  the  request,  ^Describe  the  officer's  ac¬ 
tion.  Mhat  did  he  do?'  Several  thousand  Incidents  were  collected 
In  this  way  and  analyzed  to  provide  a  relatively  objective  and 
factual  definition  of  combat  leadership.  The  resulting  set  of 
descriptive  categories  was  called  the  'critical  requirements'  of 
combat  leadership”  (p.  328). 

For  more  Information  on  the  critical  Incident  technique,  see,  for 
example,  the  following  two  sources: 

a.  Barnes,  T.  I.  (1960).  The  critical  Incident  technique.  Socio¬ 
logy  and  Social  Research,  44,  345-347. 

b.  Flanagan,  J.  C.  (1954).  The  critical  Incident  technique.  Psy¬ 
chological  Bulletin.  327-358. 


13 


III-B  Page  6 
1  Jul  76 


Using  Impressions  of  a  Topic  to  Determine  Attitude  Scale  Content 

When  the  questionnaire  Is  an  attitude  scale,  a  useful  method  for 
selecting  Items  for  It  Is  to  ask  a  group  of  Individuals  to  write 
six  statements  giving  their  Impressions  of  a  topic,  such  as  Army 
pay.  From  these,  some  smaller  number  of  statements  can  be  selected 
that  are  readable.  Intelligible,  and  capable  of  classification. 
These  statements  can  then  be  sorted  Into  several  categories,  such 
as  the  status  of  the  topic  and  Its  good  and  bad  features. 


19 


C.  other  Considerations  Related  to  Questionnaire  Content 

This  section  discusses  «  number  of  topics  related  to  questionnaire 
content:  questions  that  should  be  asked  related  to  questionnaire 
content;  sources  of  bias  In  questionnaire  construction;  and  charac¬ 
teristics  of  good  questions  that  affect  questionnaire  content. 

1.  Questions  That  Should  Be  Asked  Related  to  Questionnaire  Content 

Asking  yourself  the  following  five  questions  may  lay  the  foundation 
for  a  far  more  valuable  questionnaire  than  would  otherwise  be 
produced.  If  you  can't  answer  these  questions,  be  sure  to  read  or 
re-read  the  Outline  Test  Plan  and  the  Independent  Evaluation  Plan. 

a.  Who  needs  the  Information?  Knowledge  of  who  needs  the  Informa¬ 
tion  will  provide  a  source  In  the  event  answers  are  needed  to 
the  following  four  questions. 

b.  What  decisions  will  be  made  based  on  your  Information?  This 
will  tell  In  part  why  the  Information  Is  needed.  Depending  on 
what  decision  Is  going  to  be  made,  some  kinds  of  Information 
will  make  a  difference  and  should  be  collected,  and  other  kinds 
will  not. 

Suppose,  for  example.  Information  Is  to  be  collected  as  a  part 
of  a  test  comparing  a  new  Item  of  equipment  with  an  old  stan¬ 
dard  Item.  The  nature  of  the  decision  to  be  made  Is  clear 
enough.  It  will  be  either  selection  of  the  new  equipment,  or 
retention  of  the  old  with  which  It  Is  being  compared.  The 
basis  for  the  decision  will  usually  also  be  clear  from  the 
small  development  requirement  (SOR)  or  qualitative  materiel 
requirement  (QMR)  which  led  to  the  development  of  the  Item 
being  tested.  Analysis  of  the  QMR  will  Identify  the  qualita¬ 
tive  requirements  the  new  equipment  must  have,  and  will  give 
the  start  needed  to  develop  questions. 

c.  What  facts  will  affect  the  decision?  While  this  may  be  a 
difficult  question  to  answer,  trying  to  do  so  should  Identify 
Items  of  Information  that  should  be  sought  with  the  question¬ 
naire.  It  may  also  head  off  the  collection  of  unnecessary 
Information. 

d.  Whom  are  you  asking?  To  get  good  Information,  not  only  must  a 
good  question  be  asked,  but  It  must  be  asked  of  someone  who  has 
the  answer.  It  would  not,  for  example,  be  reasonable  to  ask 
support  troops  In  a  supply  depot  questions  about  combat  opera¬ 
tions. 


III’C  Page  2 
8  Mar  85 
(s.  1  Jul  76) 


e.  What  are  the  consequences  of  a  wrong  answer?  While  this  ba¬ 
sically  Is  an  administrative  question.  It  has  an  Important 
bearing  on  field  questionnaire  design.  Clearly,  If  It  makes 
little  difference  which  of  two  alternatives  Is  chosen.  It  makes 
little  difference  If  the  Information  Is  collected.  On  the 
other  hand.  If  there  Is  a  chance  that  substantial  dollar  sav¬ 
ings  will  result  from  the  use  of  a  more  effective  training 
technique,  or  that  millions  of  dollars  will  be  wasted  by  buying 
a  new  piece  of  equipment  which  Is  not  better  than  the  old.  It 
Is  necessary  to  design  tests  very  well,  and  ask  the  right 
questions  with  great  care. 

2.  Refining  Questions 

Early  versions  of  questions  usually  need  to  be  refined.  The  fol¬ 
lowing  approaches  will  assist  In  developing  better  questions: 

a.  Try  out  questions  on  co-workers. 

b.  Identify  problems  In  question  wording  prior  to  pretesting. 

c.  Pretest  the  questionnaire,  and  modify  as  needed.  This  should 
help  In  making  the  questionnaire  easier  for  the  respondents  to 
use,  and  to  assure  meeting  the  objectives  of  the  field  test. 

3,  Sources  of  Bias  In  Questionnaire  Construction 

Two  primary  sources  of  bias  In  questionnaire  construction  that  have 

been  Identified  are  Investigator- bias  and  question  bias. 

a.  Investigator  bias  arises  from:  choice  of  subject  matter;  study 
design  and  procedure;  unfair  or  loaded  phrasing  of  questions; 
and  Interpretation  and  reporting  of  results.  Sources  of  such 
biases  Include:  the  questionnaire  developers'  relationships 
with  the  clients;  their  personal  Involvement  In  a  particular 
theoretical  position  or  research  technique;  and  those  personal 
traits  attributable  to  class,  race,  or  political  Ideology.  To 
reduce  the  Impact  of  such  bias,  questionnaire  developers  need 
to:  be  aware  of  the  problems;  seek  critiques  from  Independent 
sources;  carefully  review  previously  published  related  reports; 
and  continue  pursuing  technical  Improvement  In  their  Investiga¬ 
tions. 

b.  Four  ways  titat  have  been  suggested  of  minimizing  question  bias 
when  asking  opinion  questions  are:  ask  many  questions  on  the 
same  topic;  determine  by  scale  analysis  whether  questions  ask 
the  respondents  about  the  same  dimensions  of  opinion  (see 
Chapter  V);  ask  "How  strongly  do  you  feel  about  this?"  after 
each  opinion  question;  and  relate  the  content  of  opinion  to  the 
Intensity  of  feeling. 


/ 


Chapter  IV;  Types  of  Questionnaire  Itews 


IV-A  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


A.  Overview 


This  chapter  discusses  various  types  of  questionnaire  Items:  open- 
ended  Items  (Section  IV-B),  multiple  choice  Items  (Section  IV-C), 
rating  scale  Items  (Section  IV-D),  behavioral  scale  Items  (Section 
IV-E),  ranking  Items  (Section  IV-F),  forced  choice  and  paired-compari¬ 
son  Items  (Section  lY-G),  card  sorting  Items/ tasks  (Section  IV-H),  and 
semantic  differential  Items  (Section  IV-I).  For  each  of  these  major 
item  types,  definitions  and  examples  are  presented,  advantages  and 
disadvantages  are  noted,  and  recommendations  regarding  their  use  In 
Army  field  test  evaluations  are  given.  Other  types  of  Items  are  noted 
In  Section  IV-J:  checklists,  matching  Items,  arrangement  Items,  and 
formats  providing  for  supplementary  responses. 

It  may  be  noted  that  a  number  of  ways  have  been  utilized  In  the  pro¬ 
fessional  literature  for  differentiating  and  classifying  Item  types. 
Which  types  are  special  cases  of  other  types  could  be  debated  at 
length.  Unanimous  agreement  with  the  definitions  given  In  this  manual 
cannot,  therefore,  be  anticipated. 


23 


IV-B  Page  1 
8  Har  85 
(s.  I  Jul  76} 


B.  Open-Ended  Items 

I 

1.  Definition  and  Examples 

Open-ended  Items  are  those  which  permit  respondents  to  express 
their  answers  to  the  questions  In  their  own  words,  and  to  Indicate 
any  qualifications  they  wish.  They  are  like  general  questions 
asked  In  an  unstructured  Interview.  By  contrast.  In  a  closed-end 
Item,  all  the  answers/choices/responses  permitted  are  displayed, 
and  respondents  need  only  to  check  their  preferred  choices.  Exam¬ 
ples  of  open-ended  Items  are  shown  In  Figure  IV-B-1. 


Figure  IV-B-l 

Examples  of  Open-Ended  Items 

1.  Describe  any  problems  you  experienced  In  moving  through  the 
test  course  while  wearing  the  new  PRC-99  radio  harness. 


2.  The  M16  rifle  Is: 


3.  What  do  you  think  of  the  AR-15  rifle  sight? 


2.  Advantages  of  Open-Ended  Items 

a.  Questions  with  open-ended  response  formats  allow  the  respon¬ 
dents  considerable  latitude  In  their  responses. 

b.  Open-ended  Items  allow  for  the  expression  of  middle  opinions 
that  closed-end  Items  with  two  choices  would  not. 

c.  Open-ended  Items  allow  for  the  expression  of  Issues  of  concern 
that  may  not  have  been  Identified  by  the  question  writer. 

d.  Open-ended  Items  allow  researchers  to  obtain  answers  that  are 
unanticipated;  unique  Information  may  be  provided. 

e.  Open-ended  Items  are  very  easy  to  ask.  This  Is  useful  when  the 
question  writer  either  does  not  know,  or  Is  not  certain  about, 
the  range  of  possible  alternative  answers. 


24 


/ 


f.  with  an  open-ended  question.  It  Is  possible  to  find  out  what  Is 
salient  to  the  respondents,  what  their  frame  of  reference  Is, 
and  how  strongly  they  feel. 

g.  Open-ended  questions  permit  respondents  to  describe  more  close¬ 
ly  and  fully  their  real  views. 

h.  There  are  times  when  more  valid  answers  may  be  obtained  from 
open-ended  than  closed-end  Items.  For  example,  there  may  be  a 
tendency  for  respondents  to  Inflate  yearly  Income  figures. 
Providing  response  alternatives  may  result  In  an  even  greater 
Inflation. 

I.  Answers  to  open-ended  questions  may  be  useful  when  treated  as 
anecdotal  material. 

J.  Respondents  like  the  opportunity  to  answer  some  questions  In 
their  own  words. 

Disadvantages  of  Open-Ended  Items 

a.  Open-ended  Items  are  time  consuming  for  the  respondent. 

b.  Open-ended  questions  which  are  self-administered  and/or  group- 
administered  place  a  burden  on  the  reading  and  writing  skills 
of  the  respondent. 

c.  Asking  people  to  answer  questions  In  their  own  words  Increases 
the  task  difficulty,  and  can  affect  the  rate  of  response.  For 
example,  respondents  may  say  that  they  have  no  problems  rather 
than  taking  the  time  to  write  out  what  the  problems  are.  Item 
1  In  Figure  IY-B-1  Is  poor  In  this  respect,  but  Item  2  Is 
worse. 

d.  Only  highly  motivated  respondents  will  take  the  time  to  write  a 
complete  answer  to  each  question. 

e.  Open-ended  Items  often  leave  the  respondents  on  their  own  to 
determine  what  Is  relevant  In  the  evaluation.  For  Instance, 
Item  2  In  Figure  IY-B-1  leaves  the  respondents  to  determine 
what  Is  relevant  In  evaluating  the  M16  rifle.  This  Is  Inappro¬ 
priate.  Open-ended  questions  should  not  be  used  to  bypass  the 
understanding  of  operations  that  the  questionnaire  writer 
should  have  or  should  acquire  before  preparing  the  final  ver¬ 
sion  of  the  questionnaire.  . 

f.  Questionnaires  that  use  closed-end  Items  are  generally  more 
reliable  than  those  using  open-ended  Items. 


IV-B  Page  3 
8  Map  85 
(*.  I.JU176) 

g.  Open-ended  questions,  answered  by  motivated  respondents,  are 
capable  of  overloading  data  analysts.  They  usually  cannot  be  ' 
handled  by  machine  analysis  methods  without  lengthy  preliminary 
steps.  Analysis  of  the  responses  to  an  open-ended  question 
usually  must  be  done  by  someone  who  has  substantial  knowledge 
about  the  question's  content,  rather  than  by  a  statistical 
clerk.  They  are  often  difficult  to  code  for  analyses.  Thus, 
the  data  analysis  task  can  grow  Into  a  major  project  and  prob¬ 
lem. 

h.  Open-ended  questions  may  be  easier  to  misinterpret  since  the 
respondent  does  not  have  a  set  of  response  alternatives  avail¬ 
able  which  might  In  themselves  provide  the  proper  frame  of 
reference. 

1.  MuOh  of  the  material  obtained  from  an  open-ended  question  may 
be  repetitious  or  Irrelevant. 

j.  Since  open-ended  questions  are  more  time  consuming,  a  con- 
sti^alnt  Is  placed  on  the  number  of  questions  that  can  be  asked. 

k.  Open-ended  questions  are  more  subject  to  Interviewer  variations 
than  are  closed-end  questions. 

l.  Open-ended  Items  are  often  harder  for  .the  respondent  to  answer 
than  closed-end  questions.  For  example,  respondents,  when 
asked  their  annual  Income,  may  have  to  struggle  to  come  up  with 
relatively  specific  figures,  whereas  when  response  alternatives 
are  presented,  they  need  only  Indicate  one  of  a  number  of 
ranges  of  Income. 

m.  Inadvertent  phrasing  of  open-ended  questions  can  sometimes 
modify  responses  In  unrecognized  and  unintended  ways.  It  Is 
difficult  to  predict  In  advance  which  words  will  bias  an  Item. 
Subtle  words  appear  to  cause  more  distortion  than  bla.tantly 
biasing  words. 

4.  Recommendations  Regarding  Use 

a.  Open-ended  questions  should  be  rarely  used  and,  even  then,  such 
questions  should  sharply,  focus  respondents'  attention  and 
thereby  reduce  their  writing  burden. 

b.  Closed  questions  are  better  for  self-administered  question¬ 
naires  than  open  questions. 

c.  In  situations  where  time  and  money  constraints  are  paramount. 

It  would  be  more  appropriate  to  use  closed  questions. 


26 


IV-B  Page  4 
8  Nar  85 
(s.  I  Jul  76) 

d.  Closed  questions  are  preferred  for  surveys  where  the  responses 
would  nore  likely  be  dichotomous. 

e.  For  collecting  nominal  data,  the  researcher  has  a  choice  about 
whether  to  ask  open-ended  or  closed-end  questions. 

f.  When  responses  can  be  obtained  by  degree  (for  example,  strongly 
agree  to  strongly  disagree),  a  closed-end  question  would  be 
superior  to  an  open-ended  question. 

g.  Sometimes  a  good  procedure  Is  to  use  an  open-ended  question 
with  a  small  number  of  respondents  as  a  pretest.  In  order  to 
find  out  what  the  range  of  alternatives  Is.  It  may  then  be 
possible  to  construct  good  closed-end  questions  that  will  be 
faster  to  administer  and  easier  to  analyre. 

h.  Open-ended  questions  are  most  useful  when  there  are  too  many 
possible  responses  to  be  listed  or  foreseen;  when  It  Is  Impor¬ 
tant  to  measure  the  sallency  of  an  Issue  to  the  respondent;  or 
when  a  rapport-building  device  Is  needed  In  an  Interview. 

1.  To  obtain  In-depth  Information  on  various  content  areas,  a  more 
focused  and  guided  approach  would  be  the  use  of  an  Interview 
with  open  questions. 

j.  Use  long  open  questions  with  familiar  wording  for  questions 
with  potentially  threatening  content. 

k.  It  Is  sometimes  useful  to  Include  one  or  more  open-ended  Ques¬ 
tions  along  with  closed-end  questions  In  order  to  obtain  verba¬ 
tim  responses  or  comments  that  can  be  used  to  provide  "flavor* 
of  responses  In  a  report. 


IV-C  Page  1 
8  Har  85 
(s.  1  Jul  76) 


C.  Multiple  Choice  Items 

1.  Definition  and  Examples 


In  a  multiple  choice  item,  the  respondent's  tasic  is  to  choose  the 
appropriate  or  best  answer  from  several  given  answers  or  options. 

As  used  here,  multiple  choice  items  include  dichotomous  or  two- 
choice  items  as  special  cases.  And,  since  only  the  permitted 
answers  are  available  for  selection,  the  multiple  choice  item  may 
also  be  termed  a  closed-end  Item. 

Examples  of  multiple  choice  Items  are  shown  in  Figure  IY-C-1. 

Items  3,  4,  and  5  are  dichotomous,  I.e.,  provide  two  response 
alternatives. 

A  comparison  of  true-false  items  with  nondichotomous  multiple 
choice  items  is  made  In  Section  VI-G,  since  they  are  issues  related 
to  the  number  of  response  alternatives. 

2.  Advantages  of  Multiple  Choice  Items 

a.  As  seen  in  item  2  of  Figure  IY-C-1,  the  questionnaire  writer 
may  select  different  numbers  of  response  alternatives  depending 
upon  knowledge  of  the  respondent's  experience  or  depending  upon 
the  decision  to  allow  or  disallow  respondents  to  "sit  on  the 
fence"  by  including  a  "no  preference"  alternative.  (See  Sec¬ 
tion  YI-C  for  wording  of  items,  and  Section  VI-G  regarding  the 
number  of  response  alternatives  to  employ.) 

b.  Responses  are  more  reliable  when  response  alternatives  are 
provided  for  respondents. 

c.  Interpretation  of  responses  is  more  reliable  when  response 
alternatives  are  provided  to  respondents. 

d.  Dichotomous  items  are  relatively  easy  to  develop,  and  permit 
rapid  analyses. 

e.  Complex  questions  can  often  be  broken  down  into  two  or  more 
simpler  questions. 

f.  Multiple  choice  items  are  easily  scored,  which  means  that  data 
analysis  is  a  relatively  inexpensive  process  requiring  no 
special  content  expertise. 

g.  Multiple  choice  items  require  considerably  less  time  per  re¬ 
spondent  answer  than  open-ended  items. 

h.  Multiple  choice  items  put  all  persons  on  the  same  footing  when 
answering.  That  is,  each  person  will  be  able  to  consider  the 
same  range  of  alternatives  when  choosing  an  answer. 

i.  Multiple  choice  items  are  easy  to  administer. 


23 


Figure  IV-C-1 

Examples  of  Multiple  Choice  Items 

1.  Uhat  do  you  consider  the  most  Important  characteristic  of  a 
good  helmet?  (Check  one) 

____  Comfort 

Stability 

Utility  for  wash  basin 
_____  Protection 
____  Height 

2.  Which  dc  you  prefer,  the  N16  or  the  H14  rifle?  (Check  one) 

_ M14 

_ M16 

_____  No  preference 

3.  Here  you  able  to  fire  effectively  from  the  frontal  parapet 
emplacement? 

_____  Yes  ____  No 

4.  Which  do  you  prefer,  the  ABC  helmet  or  the  XYZ  helmet? 

_  ABC  helmet  _  XYZ  helmet 

5.  The  H16  is  a  better  rifle  than  the  M14. 

_____  True  _____  Fal  se 

6.  What  is  your  marital  status? 

Single 
_____  Married 
_____  Divorced 

Other  (e.g.,  separated,  widowed,  etc.) 


29 


Disadvantages  of  Multiple  Choice  Items 


IV-C  Page  3 
8  Mar  85 
(s.  1  Jul  76) 


a.  Dichotomous  Items  force  the  respondents  to  make  a  choice  even 
though  they  may  feel  there  are  no  differences  between  the  al¬ 
ternatives,  or  they  do  not  know  enough  about  either  to  validly 
choose  one.  Furthermore,  respondents  are  not  permitted  to  say 
how  much  better  one  alternative  Is  than  the  other. 

b.  Two  alternatives  might  not  be  enough  for  some  ^pes  of  ques¬ 
tions,  The  question  designer  may  oversimplify  an  Issue  by 
forcing  It  Into  two  categories. 

c.  There  may  be  a  tendency  for  respondents  to  choose  an  answer  on 
the  basis  of  a  response  set.  (See  Chapter  XII.) 

d.  Unless  care  Is  taken  In  the  construction  of  multiple  choice 
Items,  the  response  alternatives  may  overlap. 

e.  The  question  maker  has  to  know  the  full  range  of  significant 
possible  alternatives  at  the  time  the  multiple  choice  question 
Is  formulated. 

• 

f.  Multiple  choice  Items  must  be  worded  with  very  great  care. 
Otherwise,  the  Information  obtained  may  not  be  valid. 

g.  with  dichotomous  Items,  any  slight  language  difficulty  or 
misunderstanding  of  even  one  word  could  change  the  answer  from 
one  extreme  to  another. 

Recommendations  Regarding  Use 

a.  For  some  purposes,  the  dichotomous  question  (two  response  al¬ 
ternatives)  may  be  an  Improvement  over  the  open-ended  question 
In  that  It  provides  for  faster  and  more  economical  analysis  of 
■data.  However,  It  requires  more  care  In  Its  development. 

b.  Generally  speaking,  dichotomous  multiple  choice  questions 
should  be  avoided.  If  used,  they  should  probably  be  followed- 
up  to  determine  the  reason  for  a  given  response. 

c.  Nondichotomous  multiple  choice  Items  are  popular  and  have  wide 
utility.  They  are  recoaraended  for  general  use  as  appropriate. 

d.  Forced  response  and  multiple  choice  Items  are  desired  when 
measuring  soft  data  such  as  opinions.  Checklists  are  recom¬ 
mended  for  hard  data  such  as  physical  aspects  of  a  job  analysis 
or  a  broad  generalization  for  measuring  opinions  prior  to  a 
later  survey. 


30 


IV-C  Page  4 
8  Mar  85 

e.  The  development  of  questionnaire  Items  should  Include  pilot 
testing  using  open-ended  Items  which  are  later  converted  to 
multiple  choice  Items. 

f.  No  one  scaling  format  has  consistently  been  superior  to 
another.  Rating  scales  need  to  be  evaluated  on  other  criteria 
than  number  of  scale  points,  vertical  and  horizontal  formats, 
and  unipolar  or  bipolar  scales. 

g.  Prior  to  multiple  choice  format  selection,  the  type  of  mea¬ 
surement  scale  and  data  analysis  should  be  Identified. 

h.  Multiple  choice  Items  represent  measurement  scales  which  are 
nominal,  ordinal,  or  Interval.  These  measurement  categories 
Indicate  the  rules  for  assigning  numbers  to  the  data  so  that 
the  appropriate  statistical  analyses  can  be  performed. 

1.  Ordinal  measurement  scales  are  common  In  surveys  where  respon¬ 
dents  are  required  to  rank  Items  or  to  use  a  paired-comparison 
method. 

j.  One  Item  cannot  adequately  cover  a  topic  area.  It  Is  necessary 
to  develop  many  Items  to  avoid  obtaining  only  surface  facts, 
and  to  provide  the  researcher  with  a  deeper  understanding  of 
the  relevant  experience  of  the  respondents. 

k.  Multiple  choice  Items  can  be  developed  which  measure  higher 
order  objectives. 

l.  If  multiple  questions  are  asked  about  different  possible  re¬ 
sponses  to  a  problem,  separate  specific  questions  that  can  be 
understood  by  all  respondents  and  easily  Interpreted  are  re¬ 
quired. 

m.  The  length  of  an  Item  may  possibly  modify  the  response  style. 
Researchers  may  wish  to  develop  alternate  versions  of  question¬ 
naire  Items  where  the  different  ve’*s1ons  are  of  different 
lengths.  This  would  allow  comparison  of  the  effect  of  Item 
length  on  responses. 


31 


Definitions  and  Examples 

Rating  scale  items  are  a  variation  of  multiple  choice  items.  They 
are  a  means  of  assigning  a  numerical  value  to  a  person's  judgment 
about  some  object.  They  call  for  the  assignment  of  responses 
either  along  an  unbroken  continuum  or  in  ordered  categories  along 
the  continuum.  The  end  result  is  the  attachment  of  numbers  to 
those  assignments.  Ratings  may  be  made  concerning  almost  anything, 
including  people,  groups,  ourselves,  objects,  and  systems. 

There  are  a  number  of  different  forms  of  rating  scale  items,  only 
two  of  which  are  shown  here.  Figure  IV-D-1  shows  examples  of 
"numerical'*  scales.  In  item  1,  a  sequence  of  defined  numbers  is 
provided  for  the  respondent. 


Figure  IY-D-1 

Examples  of  Numerical  Rating  Scale  Items 

1.  The  cleaning  kit  for  the  M16  rifle  is 

7  very  easy  to  use. 

6  quite  easy  to  use. 

______  5  fairly  easy  to  use. 

_____  4  borderline. 

_____  3  fairly  difficult  to  use. 

_____  2  quite  difficult  to  use. 

_____  1  very  difficult  to  use. 

2.  How  satisfied  or  dissatisfied  are  you  with  the  type  of  furni¬ 
ture  in  the  barracks? 

_  Very  satisfied 

_____  Satisfied 

_  Borderl i ne 

Dissatisfied 

_____  Very  dissatisfied 

3.  The  training  that  I  have  received  at  Fort  Hood  has  been 

_____  very  challenging. 

______  challenging. 

■  borderline. 

_____  unchallenging. 

_  very  unchallenging. 


IV-0  Page  2 
I  Jul  76 

The  respondents  are  to  Indicate  which  defined  nunber  best  fits 
their  Judgment  about  the  object  to  be  rated.  Sometimes,  the  num* 
bers  are  not  shown  on  the  form  used  by  the  respondent  (e.g..  Items 
2  and  3).  Instead,  the  respondent  reports  In  terms  of  descriptive 
cues  and  the  numbers  are  attached  later  during  analysis.  The 
nunbers  assigned  are  in  an  arithmetic  sequence,  such  as  S,  4,  3,  2, 
1,  depending  upon  the  number  of  response  alternatives  used.  They 
are  usually  assigned  arbitrarily  unless  the  response  alternatives 
have  been  scaled  using  one  of  the  procedures  described  In  Section 

V- B.  The  order  of  perceived  favorableness  of  commonly  used  words 
and  phrases  Is  discussed  In  Chapter  VIII. 

Figure  IY-0-2  shows  an  example  of  a  graphic  rating  scale.  In  the 
graphic  scale,  the  descriptors  are  associated  with  points  On  a  line 
or  graph,  and  . the  respondent  Indicates  a  Judgment  by  marking  the 
point  on  the  line  which  best  fits  the  rating  of  the  object.  The 
line  can  be  either  horizontal  or  vertical.  The  graphic  scale 
allows  the  respondent  to  place  a  Judgment  any  place  on  the  line. 
Thus,  the  respondents  are  not  confined  to  discrete  categories  as 
they  are  with  the  numerical  scale.  It  Is,  however,  more  difficult 
to  score,  but  this  can  be  facilitated  with  a  stencil  which  divides 
the  line  Into  segments  to  which  numbers  are  assigned. 

The  number  of  response  alternatives  to  use  Is  discussed  In  Section 

VI- G,  the  order  of  response  alternatives  In  Section  VI-H,  and 
response  anchoring  In  Chapter  VI I. 


Figure  IY-D-2 

Example  of  Graphic  Rating  Scale  Item 

1.  Place  an  X  at  the  point  on  the  scale  that  most  clearly  repre¬ 
sents  your  opinion  about  the  cleaning  kit  for  the  N16  rifle. 


01 

01 

Ul 

0) 

%/t 

3 

3 

3 

o 

O 

o 

>« 

>1 

VI 

VI 

<0 

m  ■ 

01 

fO 

01 

Of 

>» 

01 

4-» 

u 

u 

•fm 

w 

3 

m 

> 

or 

u. 

1  -LI 

■  ■IJ  L  1 

1. 1, 1  1  1 

0> 

VI 

w 

3 

VI 

o> 

3 

VI 

e 

3 

O 

4-* 

o 

4i4 

4o» 

3 

44 

u 

3 

U 

3 

•fW 

u 

01 

c 

•o 

u 

>* 

•o 

01 

V 

'a 

1. 

4J 

>1 

u 

•fm 

o 

3 

01 

GO 

or 

> 

±  L, 

1,J 

UJ  1  .J 

L-LJ 

33 


IV-0  Page  3 
8  Mar  85 

Figure  IY-D-3  shows  examples  of  continuous  scales. 

Continuous  scales  are  usually  thought  of  as  straight  lines  with  no 
Indications  of  any  differentiation  along  the  scale  lines.  A  con¬ 
tinuous  scale  can  provide  the  respondent  with  guidance  as  to  the 
directionality  of  the  rating,  and  offer  the  respondent  greater 
discrimination  as  to  ratings  along  the  scale  line.  Continuous 
scales  have  been  used  In  ergonomics  to  rate  perception  of  a  thermal 
stimulus  as  well  as  to  rate  perception  of  tones. 


Figure  IV-D-3 

Examples  of  Discrete  and  Continuous  Scales 
Used  to  Rate  Perception  of  Tones 


7  LABELS 


Closer 

to 

neither 

Extreme-  Very  Quite  one  or  Quite  Very 

1y  Close  Close  Close  Other  Close  Close 

I - H - 1 - \ - ^ ^ — - 


Extreme¬ 
ly  Close 


U  CATEGORIES  ^ 
CONTINUOUS  j, 


^ — I — i — f— f-H — I 
- - - 1 


34 


2.  Advantages  of  Rating  Scale  Items 


lY-D  Page  4 
8  Mar  85 
(s.  I  Jul  76) 


a.  When  properly  constructed,  the  rating  scale  reflects  both  the 
direction  and  degree  of  attitude  or  opinion,  and  the  results 
are  aaenable  to  analysis  using  conventional  statistical  proce* 
dures. 

b.  Graphic  rating  scales  allow  for  as  fine  a  d1  serial  nation  as  the 
respondent  Is  capable  of  giving,  and  the  fineness  of  scoring 
can  be  as  great  as  desired. 

c.  Rating  scale  items  usually  take  less  tiae  to  answer  than  do 
other  types  of  Items. 

d.  Rating  scale  Items  can  be  applied  to  almost  anything. 

e.  Continuous  scales  aay  at  times  yield  greater  discrimination  by 
raters. 

f.  Rating  scale  Items  are  generally  more  reliable  than  dichotomous 
multiple  choice  Items.  They  may  be  more  reliable  than  paired* 
comparison  Items. 

g.  Manipulation  of  the  anchors  does  not  appear  to  greatly  affect 
the  results.  The  Inadvertent  use  of  mismatching  antonyms  with 
partial  antonyms  to  anchor  a  rating  scale  may  not  Jeopardize 
the  rellablll^  of  the  scale. 

3.  Disadvantages  of  Rating  Scale  Items 

a.  Rating  scale  Items  are  more  vulnerable  to  biases  and  errors 
than  other  types  of  Items  such  as  forced  choice  Items. 

b.  Graphic  rating  scales  are  harder  to  score  than  other  types  of 
Items.  With  a  graphic  scale  Item  format,  the  verbal  anchors 
are  associated  with  points  on  a  line,  and  the  respondents  Indi¬ 
cate  their  Judgment  by  marking  the  point  on  the  line  which  best 
represents  their  Judgment.  Considerable  effort  and  time  are 
required  to  measure  the  pencil  mark's  exact  location  to  the 
nearest  portion  of  the  line. 

c.  The  results  obtained  from  the  use  of  graphic  rating  scale  Items 
may  Imply  a  degree  of  precision/accuracy  which  Is  unwarranted. 


35 


IV-0  Page  5 
8  Mar  85 
Cs.  1  Jul  76) 


4.  Recomnendatlons  Regarding  Use 

a.  The  use  of  rating  scale  Items  is  highly  recommended  for  most 
questionnaires. 

b.  Rating  scales  present  the  sentence  (stem)  first,  and  require 
the  respondent  to  select  a  response  alternative  to  complete  the 
sentence.  The  stem  Is  supposed  to  be  neutral  so  that  the 
response  alternatives  contain  different  combinations  of  direc¬ 
tionality  (positive  or  negative)  and  Intensity. 

c.  Scales  having  apparently  equal  Intervals  should  be  employed. 

The  respondent  Mill  assume  or  perceive  that  the  distances 
betHeen  adjacent  scale  points  are  equal. 

d.  Mumbers  can  be  presented  along  with  verbal  anchors. 

e.  Applications  which  require  greater  discrimination  could  use 
scales  with  more  than  five  or  six  categories,  or  with  continu¬ 
ous  lines. 

f.  It  Is  possible  to  develop  and  apply  a  continuous  scale  without 
affecting  the  psychometric  properties  of  the  scale.  Continuous 
scales  appear  to  be  equivalent  to  traditional  scales  with 
discrete  categories. 

g.  Minor  violations  In  the  technique  of  scale  development  for 
bipolar  anchors,  such  as  quasl-polar  anchors  and  phrases  for 
anchors,  do  not  appear  to  threaten  the  reliability  of  the 
Instrunent.  Therefore,  It  Is  possible  to  establish  new  ver¬ 
sions  for  bipolar  anchors. 


IV’E  Page  1 
8  Mar  85 


E.  Behavioral  Scale  Items 


1.  Deffnitlon  and  Examples 

Behavioral  scale  Items  are  derived  from  the  compilation  of  critical 
Incidents  (whether  really  critical  or  not).  They  were  developed  to 
encourage  raters  to  observe  behavior  more  accurately.  Behavioral 
scales  have  evolved  using  different  developmental  procedures  with 
divergent  scaling  foundations  associated  with  Likert,  Thurstone, 
and  Guttman  scales.  There  are  a  variety  of  behavioral  scales  such 
as  Behavioral 1y  Anchored  Rating  Scales  (BARS),  Behavioral  Expecta* 
tion  Scales  (BES),  Behavioral  Observation  Scales  (BOS),  and  Nixed 
Standard  Scales  (MSS). 

Behavioral  scales  have  customarily  been  used  to  evaluate  individual 
performance  on  the  Job.  There  have  been  other  applications  that 
Include  assessing  morale,  and  a  tool  to  make  decisions  about  the 
effectiveness  of  maintenance  trainer  equipment  and  actual  equipment 
training. 

Even  though  developmental  procedures  vary  according  to  the  type  of 
behavioral  scale,  there  are  some  conmonalltles.  Behavioral  scales 
are  built  on  large  numbers  (In  the  hundreds)  of  critical  Incidents 
which  are  reduced  In  number  by  being  fitted  Into  performance  dimen¬ 
sions  and/or  categories.  There  must  be  a  specified  level  of  agree¬ 
ment  (usually  somewhere  between  60X  and  BOX)  to  retain  a  critical 
Incident  for  Inclusion  In  the  scale.  The  critical  Incidents  are 
anchored  to  the  scale.  Critical  Incidents  describe  a  continuum  of 
effective  and  Ineffective  behavior. 


37 


IV-E  Page  2 
8  Mar  85 

Procedures  for  constructing  behavioral  scale  Items,  and  evaluative 

comments  about  them,  can  be  found  In  a  number  of  sources  Including 

the  following: 

a.  Bernardin,  H.  J.,  ^  Smith,  P.  C.  (1981).  A  clarification  of 
some  Issues  regarding  the  development  and  use  of  behavioral 1y 
anchored  rating  scales.  Journal  of  Applied  Psychology.  66(4), 
458“463, 

b.  Borman,  U.  C.  (1979).  Format  and  training  effects  on  rater 
accuracy  and  rater  errors.  Journal  of  Applied  Psychology. 

M,  410-421.  -  - 

c.  Katcher,  B.  L..  8  Bartlett,  C.  J.  (1979,  April).  Rating  errors 
of  Inconsistency  as  a  function  of  dimensionality  of  behavioral 
anchors  (Research  Report  Mo.  84).  college  Part.  MD;  Univer¬ 
sity  of  Maryland.  Department  of  Psychology.  (OTIC  Mo.  AD 
A068922) 

d. .  Kingstrom,  P.  0.,  8  Bass,  A.  R.  (1981).  A  critical  analysis  of 

studies  comparing  behavlorally  anchored  rating  scales  (BARS) 
and  other  rating  formats.  Personnel  Psychology.  34,  263-289. 

e.  Landy,  F.  J.,  8  Barnes,  J.  L.  (1979).  Scaling  behavioral 
anchors.  Applied  Psychological  Measurement.  3(2).  193-200. 

f.  Latham,  G.  P.,Fay,  C.  H.,  8  Saarl,  L.  M.  (1979).  The  devel¬ 
opment  of  behavioral  observation  scales  for  appraising  the 
performance  of  foremen.  Personnel  Psychology.  32,  299-311. 

g.  Motowldlo,  S.  J.,  8  Borman,  M.  C.  (1977).  Behavlorally  an¬ 
chored  scales  for  measuring  morale  In  military  units.  Journal 
of  Applied  Psychology.  ^(2),  177-183. 

h.  Murphy,  J.  H,  (1980).  Use  of  behavlorally  anchored  rating 
scales  (BARS)  to  complement  the  management  by  object! veTTHBO) 
and^f'ttness  report  components  of  the  Marine  Corps  performance 
evaluation  system.  Master  of  Military  Arts  and  Sciences  (MMAS) 
thesis  prepared  at  U.S.  Array  Command  and  General  Staff  College, 
Fort  Leavenworth,  MS.  (DTIC  Mo.  AD  A097694) 

Examples  of  behavioral  scale  Items  and  dimensions  are  shown  for 

BARS,  BES,  BOS,  and  MSS  In  Figures  lY'-E-l  through  IV-E-4. 


38 


IV-E  Page  3 
8  Mar  85 


Figure  IV-E-1 

Examples  of  BARS's  Seven  Dimensions 
Describing  Technician  Behavior 


1.  Safest  Behaviors  which  show  that  the  technician  understands 
and  follows  safety  practices  as  specified  In  the  technical 
data; 

2.  Thoroughness  and  Attention  to  Details;  Behaviors  which  show 
that  the  technicians  are  well  prepared  when  they  arrive  on  the 
Job,  carry  out  maintenance  procedures  completely  and  thorough*- 
1y,  and  recognize  and  attend  to  symptoms  of  equipment  damage 
or  stress; 

3.  Use  of  Technical  Data;  Behaviors  which  show  that  the  technl- 
clan  properly  uses  technical  data  In  performance  of  mainte¬ 
nance  functions; 

4.  System  Understanding;  Behaviors  which  show  that  the  technl- 
cians  thoroughly  understand  system  operation  allowing  them  to 
recognize,  diagnose,  and  corrrect  problems  not  specifically 
covered  In  the  Technical  Orders  and  publications; 

5.  Understanding  of  Other  Systems:  Behaviors  which  show  that  the 
technicians  understand  the  systems  that  are  Interconnected 
with  their  specific  system  and  can  operate  them  In  accordance 
with  technical  orders; 

6.  Mechanical  Skills;  Behaviors  which  show  that  the  technician 
possesses  specific  mechanical  skills  acquired  for  even  the 
most  difficult  maintenance  problems;  and 

7.  Attitude;  Behaviors  which  show  that  the  technician  Is  con- 
cerned  about  properly  completing  each  task  efficiently  and  on 
time. 


From  Mienclaw,  R.  A.,  &  Hines,  F.  E.  (1982,  November).  A  model  for 
determining  cost  and  training  effectiveness  trade-ofTsT  Training 
Equipment  Interservlce/lndustry  Training  Equipment  donference , 
405-416. 


39 


IV-E  Page  4 
8  Mar  85 


Figure  IY-E-2 

Example  of  BARS  Items  Representing 

Performance  and  Effort  on  the  Job 

Scale 

Point 

Behavioral  Anchor 

9 

When  maintenance  mechanics  found  an  error  In  their 
assembly  procedures  on  an  aircraft,  they  told  their 
platoon  leaders  of  their  mistake  and  requested  that  the 
hangar  be  open  Saturday  and  Sunday  If  necessary  to  meet 
their  previously  promised  Monday  delivery. 

8 

While  clearing  the  brush  from  an  approach  to  an  air¬ 
port,  these  dozer  operators  never  shut  the  dozer  off. 
running  In  shifts  right  through  lunch. 

7 

This  section  was  asked  to  prepare  a  set  of  firing 
charts  by  a  specific  time.  The  charts  were  finished 
ahead  of  time. 

6 

Although  this  section  was  constantly  called  upon  for 
typing  tasks,  the  work  was  done  with  few  mistakes  and 
on  a  timely  basis. 

5 

The  men  In  this  unit  did  not  push  for  top  performance, 
although  they  did  their  jobs  and  kept  busy. 

4 

Many  troops  In  this  unit  would  leave  the  post  as  quick¬ 
ly  as  possible  after  du^  hours  to  avoid  doing  any 
extra  work. 

3 

The  service  section  of  a  support  unit  had  a  large 
backlog  of  equipment  needing  repair.  All  enlisted 
personnel  assigned  to  this  section  appeared  to  be  busy, 
but  their  output  was  very  low  compared  to  ,he  other 
service  sections. 

2 

The  men  In  this  section  signed  out  weapons  to  be 
cleaned  but  sat  around  and  "shot  the  bull*  un  il  It 
was  time  to  turn  the  weapons  back  In. 

1 

During  one  period,  these  enlisted  personnel  slowed 
their  work  down  and  made  mistakes  that  cost  time  and 
new  parts.  They  were  working  7-day  weeks,  but  at  the 
end  of  the  period,  they  were  accomplishing  only  the 
same  amount  of  work  In  7  days  that  they  had  been 
accomplishing  before  In  5  days. 

From  Motowldio,  S.  J.,  i  Borman,  W.  C.  (1977).  Behavioral 1y  anchored 
scales  for  measuring  morale  In  military  units.  Journal  of  Applied 
Psychology.  62(2).  177-183. 


40 


Figure  IV- £-3 


Example  of  BOS  Item  Representing 
Description  of  Foreman's  Job 


Tells  crew  to  inform  him  immediately  of  any  unsafe  condition. 
Almost  Never  1  2  3  4  5  Almost  Always 


From  Latham,  G.  P.,  Fay,  C.  H.,  &  Saari,  L.  t.  (1979).  The  development 
of  behavioral  observation  scales  for  appraising  the  performance  of 
foremen.  Personnel  Psychology,  32,  299-311. 


Figure  IV-E-4 

Example  of  MSS  Items  Representing 
Highway  Patrol  Stopping  Vehicles  for  Violations 


0  Stops  vehicles  for  a  variety  of  traffic  and  other  violations. 

0  Concentrates  on  speed  violations,  but  stops  vehicles  for  other 
violations  also. 

0  Concentrates  on  one  or  two  kinds  of  violations  and  spends  too 
little  time  or  others. 


From  Rosinger,  G.,  r  ers,  L.  B.,  Levy,  G.,  Loar,  N.,  Nohrman,  S.  A.,  4 
Stock,  R.  (1982).  Development  of  behavioral 1y  based  performance 
appraisal  system.  Personnel  Psychology,  35,  75-88. 


41 


IV*E  Page  6 
8  Mar  85 


2.  Advantages  of  Behavioral  Scale  Items 

a.  Raters  nay  not  be  cognitively  prepared  to  suonarlze  and  ab¬ 
stract  accurately.  More  reliable  ratings  oay  be  obtained  on 
behavioral  scales  by  using  the  jargon  of  raters,  and  by  having 
raters  maintain  observational  diaries. 

b.  It  has  been  found  that  It  Is  possible  to  generalize  a  Behavior- 
ally  Anchored  Rating  Scale  (BARS)  Instrument  for  use  with 
similar  populations  In  other  organizations  where  the  same  types 
of  tasks  are  being  performed. 

c.  Behavioral  Expectation  Scales  (BES)  can  be  used  to  clarify 
organizational  policy,  provide  feedback,  assess  and  Improve 
Individual  performance,  and  Identify  divergent  perceptions. 

d.  Training  programs  of  three  hours  and  longer  have  the  potential 
to  Increase  rater  accuracy. 

c.  In  situations  where  there  Is  concern  about  halo  and  leniency 
errors.  Mixed  Standard  Scales  (MSS)  would  be  appropriate  to  use 
If  the  developmental  procedures  are  thorough. 

3.  Disadvantages  of  Behavioral  Scale  Items 

a.  The  time  and  effort  Involved  In  developing  behavioral  scale 
Items  may  not  be  worth  the  Investment  unless  there  are  other 
spin-offs  for  the  use  of  this  type  of  scale. 

b.  Behavioral  scales  require  quantification  of  Items  using  a 
sample  size  of  several  hundred  people;  they  should  not  be  based 
on  small  samples. 

c.  More  Items  are  generated  for  behavioral  scales  when  the  number 
of  dimensions  Is  Increased.  For  example,  there  Is  the  poten¬ 
tial  for  nine  dimensions  to  have  up  to  90  Items  or  more. 

d.  Raters  appear  to  prefer  a  BARS  format  over  a  MSS  format.  It 
would  probably  not  be  useful  to  construct  a  MSS  unless  halo  and 
leniency  errors  were  anticipated. 


42 


IV-E  Page  7 
8  Mar  85 


4.  Recownendatlons  Regarding  Use 

a.  Scale  development  procedures  will  be  strengthened  if  rater 
participation  is  included  for  BARS  as  well  as  other  behavioral 
scale  formats. 

b.  BARS  development  procedures  have  resulted  in  a  disproportionate 
rejection  of  mid-range  items.  Simple  item  intercorrelation 
procedures  for  the  U  (universe  score  procedure]  would  increase 
the  number  of  mid-range  items.  (DeCotiis,  T.  A.  (1978).  A 
critique  and  suggested  revision  of  behaviorally  anchored  rating 
scales  developmental  procedures.  Educational  and  Psychological 
Measurement.  38.  681-690.) 

c.  Rigor  in  the  developmental  procedures  for  constructing  various 
types  of  behavioral  scales  will  influence  and  Increase  the 
reliability  and  validity  of  the  scales  more  than  the  format. 

d.  There  appears  to  be  a  tendency  to  confound  Thurstone  scaling 
procedures  with  Likert  scaling  procedures  which  diminishes 
Uvels  of  reliability  and  validUy  for  Thurstone  scales. 
Researchers  need  to  be  aware  of  the  differences  between  Thur¬ 
stone  and  Likert  scale  development  procedures  when  they  are 
constructing  BARS,  BES,  and  BOS  behavioral  scales. 

e.  To  increase  the  MSS  format  acceptance  by  raters  for  the  scoring 

system  and  item  dimensionality,  a  coding  system  with  face 
validity  may  be  useful  as  well  as  training  for  the  raters  to 
explain  the  MSS  rationale,  and  the  procedures  for  carrying  out 
the  appraisal.  _ 

f.  MSS  requires  statistical  analysis  to  ensure  unidimensionality 
of  the  scales. 


43  ' 


IV-F  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


F.  Ranking  Items 

1.  Definition  and  Examples 

Ranking  Items  call  for  the  respondent  to  Indicate  the  relative 
ordering  of  the  members  of  a  presented  group  of  objects  on  some 
presumably  discrimlnable  dimension^  such  as  effectiveness,  salti¬ 
ness,  overall  merit,  etc.  8y  definition,  one  does  not  have  a  scale 
by  which  the  amount  of  difference  between  successive  members  Is 
measured,  nor  Is  It  Implied  In  rank  ordering  that  successive  dif¬ 
ferences  are  even  approximately  equal.  If  respondents  were  being 
asked  to  give  judgmentc  on  the  size  of  Intervals,  the  Item  would  be 
something  more  than  a  rai!k1ng  Item. 

Multiple  choice  Items  are  so  frequently  used  that  one  may  Inadver¬ 
tently  use  this  format  when  the  ranking  Item  format  would  provide 
more  complete  and  reliable  Information.  Item  1  In  Figure  IV-C-1 
Illustrates  this  point.  Since  a  preponderance  of  respondents  would 
check  "protection*  as  a  helmet's  most  Important  characteristic, 
only  a  small  remainder  of  responses  would  be  available  as  a  basis 
for  ordering  the  other  characteristics.  Some  of  the  other  charac¬ 
teristics  might  be  achievable  without  sacrificing  protection,  so  It 
would  be  desirable  to  have  a  reliable  ordering  of  their  Importance. 

As  the  number  of  objects  to  be  ranked  Increases,  the  difficulty  of  , 
assigning  a  different  rank  to  each  object  Increases  even  faster. 
This  means  that  reliability  (repeatability)  is  reduced.  To  counter 
this,  one  may  explicitly  permit  respondents  to  assign  tied  rankings 
to  objects  when  the  number  of  objects  exceeds,  say,  10  or  more. 

Examples  of  ranking  Items  are  shown  In  Figure  IV-F-1. 

There  have  been  Instances  when  rank  order  scaling  procedures  have 
been  Integrated  with  other  complex  systems.  An  Illustration  of 
this  Is  the  delta  scalar  method  used  by  the  U.S.  Navy  and  the  Air 
Force  Aerospace  Medical  Research  Laboratory.  The  delta  scalar 
method  Is  a  complex  system  of  rank  ordering  found  In  the  Mission 
Operability  Assessment  Technique  and  Systems  Operability  Measure¬ 
ment  Algorithm  (U.S.  Navy),  and  the  Subjective  Workload  Assessment 
Technique  (U.S.  Air  Force).  These  systems  Involve  establishing  a 
rank  order  scale  that  Is  converted  to  an  Interval  scale.  Proce¬ 
dures  and  recommendations  for  constructing  rank  ordering  embedded 
In  subjective  workload  assessment  methods  can  be  found  In  a  number 
of  sources  Including: 

a.  Eggemeler,  F.  T.,  Crabtree,  M.  S.,  A  La  Point,  P.  A.  (1983, 
October).  The  effect  of  delayed  report  on  subjective  ratings 
of  mental  workload.  Proceedings  of  the  Human  Factors  Society 
27th  Annual  Meeting.  139-143. 

b.  Eggemeler,  F.  T.,  Crabtree,  M.  S.,  Zingg,  J.  J,,  Reid,  G.  8.,  & 

Shingledecker,  C.  A.  (1982).  Subjective  workload  assessment  In 
a  memory  update  task.  Proceedings  of  the' Hunan  Factors  Society 
26th  Annual  Meeting.  643=S4'7:  ■  .  ■■  .• - ^ — 


44 


IV-F  Page  2 
8  Mar  85 
Cs.  1  Jul  76) 

c.  Eggemeler,  F.  T..  McGhee,  J.  Z.,  I  Reid,  G.  8.  (1983,  May). 

The  effects  of  variations  In  task  loading  on  subjective  work* 
load  rating  scales.  Proceedings  of  the  IEEE  1983  National 
Aerospace  and  ElectronTcs  Conference,  Dayton,  Ori.  10^9-1105. 


Figure  IV-F-1 
Examples  of  Ranking  Items 

1.  Rank  the  following  three  methods  of  issuing  starlight  scopes  to 
an  Infantry  squad.  Assign  a  "1"  to  the  most  effective,  a  ”2*  to 
the  second  most  effective,  etc.  Oo  not  assign  tied  rankings. 

Ranking  Basis  of  Issue 

Scopes  Issued  to  AMG  and  SL 

_______  Scopes  issued  to  AMG,  SL,  and  one  rifleman 

______  Scopes  Issued  to  all  squad  members 

2.  How  Important  are  each  of  the  following  factors  to  you?  Assign 

a  *1*  to  the  most  Important,  "2"  to  the  second  most  important,  • 
etc.  Assign  a  different  number  to  each  of  the  four  factors. 

________  Type  of  furniture  in  the  barracks 

_______  Army  pay 

_______  Medical  service  to  soldiers 

Choice  of  duty  station 


2.  Advantages  of  Ranking  Items 

a.  The  idea  of  ranking  is  familiar  to  respondents. 

b.  Ranking  takes  less  time  to  administer,  score,  and  code  than 

pal red*compar Ison  items  do,  and  there  is  some  evidence  that  the 
results  of  the  two  are  highly  similar. 

c.  Ranking  and  rating  techniques  are  generally  comparable  in  terms 
of  reliability. 

3.  Disadvantages  of  Ranking  Items 

a.  Ranking  items  such  as  item  1  in  Figure  IV*F-1  do  not  reveal  the 
respondent's  Judgment  as  to  whether  any  of  the  objects  are 
effective  or  ineffective  in  an  absolute  rather  than  Just  a 
relative  sense.  To  learn  this,  another  question  must  be  asked. 


45 


IV-F  Page  3 
8  Mar  85 
(s.  I  Jul  76) 

b.  Rank  order  scales  originate  fron  ordinal  scale  measurement. 

The  categories  In  a  rank  order  scale  do  not  Indicate  how  much 
distance  there  Is  between  each  category.  Unequal  distances  are 
assumed.  Rank  order  Items  do  not  permit  respondents  to  state 
the  relative  amounts  of  differences  between  alternatives. 

c.  The  results  from  ranking  Items  are  open  to  question  If  the 
basis  for  rcnking  was  not  clear  to  the  respondents. 

d.  Ranking  Is  generally  less  precise  than  rating. 

4.  Recommendations  Regarding  Use 

a.  Rank  order  scales  are  appropriate  for  analyzing  data  that  meets 
the  requirements  of  ordinal  measurement  scales. 

b.  There  are  some  situations  where  the  Intent  of  the  questionnaire 
developer  Is  best  served  with  the  use  of  one  or  more  ranking 
items.  Generally,  however,  rating  scale  Items  are  probably 
preferable. 

c.  Rank  order  scales  and  rating  scales  are  more  cost  effective  and 
time  effective  to  use  than  pal red*compar1 sons. 

d.  Individuals  tend  to  more  frequently  use  one  end  of  a  list  than 
the  other  end  while  ranking.  To  counteract  this  bias.  It  Is 
possible  to  develop  two  or  more  versions  of  the  list  by  ran* 
domly  ordering  the  lists. 

e.  It  Is  possible  to  combine  rank  ordering  with  other  methods, 
such  as  task  analysis,  to  Isolate  critical  components  of  a  Job. 
This  Information  can  be  transformed  Into  a  performance  mea* 
surement  system,  or  can  be  used  to  modify  military  training. 

f.  Analysis  of  the  data  for  test*retest  reliability  performed  on 
rank  order,  palred'comparlson,  and  Likert  scales  varied  de* 
pending  on  whether  a  Spearman  rho  or  Kendall's  tau  was  used. 
Kendall's  tau  may  be  a  more  appropriate  measure  of  reliability 
for  rank  order  measures. 


4G 


G.  Forced  Choice  Items 


IV-G  Page  1 
8  Mar  85 
Cs.  1  Jul  76) 


1.  Definition  and  Examples 

It  would  appear  that  any  multiple  choice  item  could  also  be  called 
a  "forced  choice"  item  because^  after  all,  the  respondent  is  ex¬ 
pected  to  choose  one  of  the  response  alternatives.  The  instruc¬ 
tions  and/or  the  presence  of  an  administrator  put  some  degree  of 
social  pressure  -  social  force  -  on  the  respondent.  However,  if  a 
multiple  choice  item  includes  an  "I  don't  know"  response  alterna¬ 
tive,  the  pressure/force  is  almost  totally  removed.  Likewise,  on  a 
rating  scale  item,  the  inclusion  of  a  "neutral"  or  "borderline" 
response  category  allows  the  respondents  to  answer  without  commit¬ 
ting  themselves. 

So,  for  some  questionnaire  developers  -  in  particular  those  who 
produce  "forced  choice  self  inventories"  (see  references)  -  a 
^forced  choice"  item  strictly  refers  to  one  where  the  respondents 
must  commit  themselves.  They  may  have  to  select  one  of  a  pair  of 
choices,  or  two  of  three,  or  two  of  four.  These  three  cases  are 
illustrated  in  Figure  IV-G-1. 

2.  Advantages  of  Forced  Choice  Items 

.  a^  Studies  have  indicated  that  reliabilities  and  validities  ob¬ 
tained  from  the  use  of  forced  choice  items  compare  favorably 
with  other  methods. 

b.  The  forced  choice  method  has  been  used  by  a  number  of  investi¬ 
gators  in  an  attempt  to  control  the  tendency  of  individuals  to 
answer  self-report  inventories  in  terms  of  response  sets  rather 
than  giving  "true*  responses.  (Response  sets  are  discussed  in 
Chapter  XII.) 

3.  Disadvantages  of  Forced  Choice  Items 


a.  Respondents  sometimes  balk  at  picking  unfavorable  statements, 
or  at  being  forced  to  make  a  choice. 

b.  Forced  choice  items  take  more  time  to  develop  than  some  other 
types  of  items. 

c.  Paired-comparison  items,  where  all  phrases  are  paired,  take 
more  time  to  administer,  score,  and  code  than  do  ranking  items. 
Results  from  the  two,  however,  may  have  a  linear  relationship. 


IV-G  Page  2 
8  Mar  85 
(s.  1  Jul  76) 


Figure  IY-G-1 

Examples  of  Forced  Choice  Items 

1.  Check  one  of  the  following  two  statements  that  Is  more  charac¬ 
teristic  of  what  you  like. 

______  I  like  to  travel. 

_____  I  like  to  meet  new  people. 

2.  Check  one  of  the  two  following  statements  that  Is  more  charac¬ 
teristic  of  yourself. 

_____  1  am  honest. 

_____  I  am  intelligent. 

3.  Look  at  the  following  three  activities.  Mark  an  "M*  by  the  one 
you  like  the  most,  and  an  "L"  by  the  one  you  like  the  least. 

______  Play  baseball 

_____  Go  to  the  craft  shops 

______  Attend  boxing  or  wrestling  matches 

4.  From  the  following  fou^statements,  check  the  that  are  most 
descriptive  of  your  unit  commander. 

______  Serious-minded 

_____  Energetic 

____  Very  helpful 

_  Gets  along  well  with  others 


d.  There  Is  some  question  as  to  whether  forced  choice  Items  over¬ 
come  the  biases  or  errors  they  are  supposed  to  correct. 

e.  Some  Investigators  have  concluded  that  the  generalization  that 
self-report  forced  choice  Inventories  are  more  valid  than 
single  stimulus  forms  of  the  same  tests  Is  not  supported  by  a 
critical  consideration  of  the  relevant  evidence. 


/ 


48 


IV-G  Page  3 
8  Mar  85 
(s.  1  Jul  76) 

Procedures  for  constructing  forced  choice  Itens,  and  evaluative 
coBMients  about  then,  can  be  found  In  a  number  of  sources  Including 
the  following: 

a.  Guilford,  J.  P.  (1954).  Psychometric  methods  (2nd  ed.).  New 
York:  McGraw-Hill. 


b.  Nunally.  J.  C.  (1967).  Psychometric  Theory.  New  York:  Mc¬ 
Graw-Hill,  pp  484-485. 

c.  Sisson,  E.  0.  (1948).  Forced  choice— the  new  Anay  rating. 
Personnel  Psychology.  1,  365-381. 


Recommendations  Regarding  Use 


When  test  participants  are  deliberately  given  relevant  experience 
with  the  operation  of  a  weapons  system,  vehicle,  or  other  system, 
the  "I  don't  know*  response  alternative  should  normally  be  deleted 
from  Items  that  seek  the  participants'  evaluations  of  the  system. 


49 


IV-H  Page  1 
8  Mar  85 
(s.  1  Jul  76} 


H.  Card  Sorting  Items/Tasks 

1.  Definition 

With  card  sorting  Itens/tasks,  the  respondents  are  given  a  large 
number  of  statements  (e.g.,  75),  each  on  a  slip  of  paper  or  card. 
They  are  asked  to  sort  them  Into,  say,  nine  or  eleven  piles.  The 
piles  are  In  rank  order  from  "most  favorable"  to  "least  favorable" 
or  "most  descriptive"  to  "least  descriptive,"  etc.,  depending  upon 
the  dimension  to  be  used.  Each  pile  usually  Is  to  have  a  specified 
number  of  statements  placed  Into  It  as  required  to  form  a  rough 
normal  distribution.  However,  some  Investigators  have  argued  that 
forcing  a  given  distribution  Is  not  necessary.  Ordinarily  each 
pile  Is  given  a  score  value  which  Is  then  assigned  to  the  state¬ 
ments  placed  Into  It. 

Ai.  extensive  discussion  of  the  use  of  card  sorts  (or,  more  general¬ 
ly,  Q-technIque  and  Its  methodology)  appears  In:  Stephenson,  W. 
The^study  of  behavior.  Chicago:  University  of  Chicago  Press, 

2.  Advantages  of  Card  Sorting  Items/Tasks 

a.  Card  sorts  appear  to  be  capable  of  counteracting  at  least  some 
of  the  biasing  effects  of  response  sets.  (Response  sets  are 
discussed  In  Chapter  XII.) 

b.  Some  Investigators  believe  that  card  sorting  Is  a  fast  and 
Interesting  method  of  obtaining  valid  and  reliable  Interview 
data. 

c.  With  card  sorts,  the  respondents  can  shift  Items  back  and  forth 
If  they  wish  to  do  so. 

d.  The  card  sort  has  greatest  value  when  a  comprehensive  descrip¬ 
tion  by  a  single  Individual  Is  desired. 

e.  Card  sorts  also  have  value  for  obtaining  complex  descriptions 
which  can  be  compared  systematically. 

f.  They  can  be  used  to  obtain  rating  Information  on  any  Issue. 

3.  Disadvantages  of  Card  Sorting  Items/Tasks 

a.  Card  sorting  Items/ tasks  may  take  more  time  to  construct  than 
other  types  of  Items,  and  they  generally  take  more  time  to 
administer  and  score. 


\ 


50 


IV-H  Page  2 
8  Har  8S 
(S.  1  Jul  76) 


b.  Card  sorts  are  more  Involved  to  administer  than  other  types  of 
questionnaire  Items. 

4,  Recommendations  Regarding  Use 

Some  authors  think  that  card  sorting  Is  the  method  of  choice  If 
testing  time  Is  available.  Its  greatest  value  seems  to  be  Its 
ability  to  provide  a  comprehensive  description  by  a  single  Indivi¬ 
dual,  or  to  obtain  complex  descriptions  which  can  be  systematically 
compared.  Since  It  Is  more  awkward  to  administer  and  score  than 
other  types  of  Items,  Its  use  In  Army  field  test  evaluations  Is 
limited. 


51 


IV-I  Page  I 
8  Mar  85 
{$.  1  Jul  76) 

Semantic  Differentia)  Items 

1.  Definition  and  Examples 

The  semantic  differential  technique  was  Initially  developed  as  a 
general  method  of  measuring  meaning,  and  with  It  the  meaning  of  a 
particular  concept  to  a  particular  Individual  can  be  specified 
quantitatively.  The  technique  has  also  been  used  to  measure  atti* 
tudes  and  values,  particularly  In  the  marketing  area.  In  using  the 
technique,  the  respondent  Is  presented  with  a  number  of  bipolar 
rating  scales,  usually  but  not  always  having  seven  points.  The  two 
ends  of  each  scale  are  defined  by  adjectives.  The  respondent  Is 
given  a  set  of  such  scales,  and  Is  asked  to  rate  each  of  a  number 
of  objects  or  concepts  on  every  scale.  To  aid  In  Interpretation, 
some  scale  coding  can  be  used,  usually  numbers  In  a  direct  numeri¬ 
cal  sequence  such  as  1  through  7.  Other  more  extensive  scoring  can 
be  used,  and  results  can  be  factor  analyzed  to  search  for  the  basic 
dimensions  of  meaning.  However,  the  usefulness  of  the  semantic 
differential  as  a  research  tool  stems  from  the  ability  of  the 
procedure  to  probe  Into  both  the  content  and  the  relative  Intensity 
of  respondents'  attitudes. 

Examples  of  semantic  differential  Items  are  given  In  Figure  IY-I-1. 
A  recommended  text  on  the  semantic  differential  Is  Osgood  C.  E., 
Suci,  6.  J.,  4  Tannenbaum,  P.  H.  (1957).  The  measurement  of  mean¬ 
ing.  Urbana,  Ill.,  University  of  Illinois  Press.  Norms  have  been 
collected  on  20  scales  for  360  words.  They  are  reported  In  Jen¬ 
kins,  J.  J.,  Russell,  W.  A.,  4  Suci,  J;  (1958);.  An  atlas  of  seman¬ 
tic  profiles  for  360  words.  American  Journal  of  Psychology.  71, 
688-699. 

2.  Advantages  of  Semantic  Differential  Items 

a.  Evidence  on  the  validity,  reliability,  and  sensitivity  of  the 
scales  has  been  offered. 

b.  Using  some  adjectives  that  do  not  seem  appropriate  to  the 
concept  under  Investigation  may  uncover  aspects  that  reflect  an 
attitude  or  feeling  tone  even  though  the  respondent  cannot  put 
It  Into  words. 

c.  Semantic  differential  Items  can  be  used  to  study  the  relative 
similarity  of  different  concepts  to  the  respondent,  and  to 
study  changes  over  time. 

d.  Semantic  differential  Items  are  relatively  easy  to  construct, 
administer,  and  score. 


52 


IV-I  Page  2 
8  Mar  85 
(s.  1  Jul  76) 


Figure  IV-I-1 

Examples  of  Semantic  Differential  Items 

1. 

Place  an  X  In  each  of  the  following  rows 
assessment  of  the  N16  rifle. 

to  describe  your 

Reliable  :  :  :  :  : 

;  Unreliable 

Heavy  :  :  :  :  : 

_ : _ Light 

Good  :  :  :  :  : 

_ _ Bad 

Slow  :  :  :  :  : 

_ Fast 

Adequate  :  :  :  :  : 

:  _____  Inadequate 

2. 

Place  at.  X  In  each  of  the  following  rows 
assessment  of  the  ABC  helmet. 

to  describe  your 

Reliable  :  :  :  :  : 

t  Unreliable 

Heavy  :  :  :  :  : 

_ : _ Light 

Good  :  :  :  :  : 

_ _ Bad 

Slow  _ : _ ; _ : _ : _ ;  _ 

_ ; _ Fast 

Adequate  :  :  :  :  : 

;  _____  Inadequate 

3.  Disadvantages  of  Semantic  Differential  Items 

a.  If  care  Is  not  taken,  the  two  adjectives  chosen  for  the  ex- 
treoes  will  not  define  some  kind  of  scale  or  dimension  between 
them. 

b.  The  value  of  semantic  differential  Items  depends  on  the  suita¬ 
ble  choice  of  the  bipolar  adjectives  and  concepts. 

c.  There  Is  a  potential  response  error  present  In  the  respondents' 
Interpretations  of  the  meaning  of  the  end-point  descriptions. 
However,  there  appears  to  be  a  balancing  out  over  a  nuaber  of 
administrations. 

d.  There  Is  the  possibility  of  a  socially  desirable  response  set 
when  personality  traits  are  measured  with  the  semantic  dif¬ 
ferential. 


53 


4.  Recommendations  Regarding  Use 


IV-I  Page  3 
8  Mar  85 
(s.  1  Jul  76) 


a.  There  are  a  number  of  Investigators  that  advocate  the  use  of 
the  semantic  differential.  Others,  however,  have  questioned 
whether  It  may  be  a  rather  complicated  way  of  developing  a 
measure  that  Is  more  readily  and  reliably  secured  by  other 
means.  It  Is  reasonable  to  assume  that  the  technique  could 
easily  be  expanded  to  Identify  attitudes  and  the  Intensity  of 
the  attitudes  toward  the  attractiveness  of  a  particular  mili¬ 
tary  specialty,  the  capacities  of  a  specific  piece  of  equipment 
to  perform,  or  any  other  characteristic  set  which  can  be  de¬ 
scribed  by  bipolar  adjectives.  However,  since  the  analysis  of 
sets  of  semantic  differential  Items  Is  somewhat  Involved,  the 
technique  has  not  been  widely  used  for  routine  Army  field  test 
evaluations. 

b.  Semantic  space  for  the  concepts  of  evaluation,  potency,  and 
activity  are  fairly  stable  across  studies,  and  have  maintained 
reliability  over  time.  Because  of  the  stability  of  the  scale. 
It  Is  possible  to  vary  Instrument  format  as  well  as  rating 
Instructions  and  maintain  the  viability  of  the  scale.  To 
ensure  the  soundness  of  the  scale,  developmental  procedures 
need  to  Include  testing  the  Instrument  In  the  context  area  for 
which  It  was  designed. 

• 

c.  In  the  early  stages  of  development  for  the  semantic  differen¬ 
tial,  It  Is  possible  to  Identify  potential  bipolar  anchors 
using  Roget's  Thesaurus  as  a  source  In  addition  to  the  sub¬ 
jects'  concepts  of  terms  that  have  semantic  stability.  Initial 
pools  of  Items  can  be  reduced  through  judgment  agreement, 
factor  analysis,  and  cluster  analysis. 

d.  Semantic  differential  scales  can  be  anchored  with  phrases, 
adjectives,  or  adverbs. 

e.  The  number  of  scale  points  used  with  the  semantic  differential 
can  vary,  and  still  retain  the  integrity  of  the  Instrument.  An 
acceptable  range  In  the  scale  would  be  between  five  and  twelve 
points.  Each  completed  survey  would  have  all  Items  with  the 
same  number  of  scale  points.  For  example,  two  questionnaires 
could  be  designed,  one  with  seven  scale  categories  and  the 
other  with  nine  scale  categories. 

f.  Social  desirability  response  sets  can  be  controlled  by  careful 
construction  of  the  bipolar  scales.  Adjectives  can  be  selected 
that  reflect  a  common  trait  to  control  the  Influence  of  social 
desirability. 


54 


IV-J  Page  I 
8  Mar  85 
(s.  I  Jul  76) 


.  Other  Types  of  Items 
1.  Checklists 

Checklists  are  instruments  in  which  responses  are  made  by  checking 
the  appropriate  statement  or  statements  in  a  list  of  statements. 
Examples  are  shown  in  Figure  IV-J-1. 

Figure  IY-J-1 
Examples  of  Checklists 

1.  Which  of  the  following  are  important  to  consider  when  deciding 
whether  or  not  to  make  a  career  of  the  Army?  Check  all  that 
apply. 

_____  Leadership  of  HCOs 

_____  Opportunity  for  promotion 

_____  Playboy  magazines  in  the  Post  Exchange 

_____  Latrine  in  crafts  shops 

___  Array  pay 

______  Choice  of  duty  stations 

_____  Civilian  opinion  of  Army 

_____  Reenlistraent  bonuses 

_____  Hours  of  work  in  a  work  week 

2.  Please  check  all  the  characteristics  which  Backpack  A  pos¬ 
sesses. 

____  Durability 

____  Lightness 

___^  Wearing  comfort 

___  Accessibility  of  items 

___  Ease  of  putting  on  and  taking  off 

_  Other  (specify);  _ 


IV-J  Page  2 
8  Har  85 
(s.  I  Jul  76} 

Checklists  can  be  used  In  conjunction  with  Interviews  to  serve  as  a 
cue  to  the  Interviewer.  Administration  of  a  checklist  combined 
with  an  Interview  of  critical  areas  Identified  on  the  checklist 
could  reduce  Interviewing  time.  Examples  are  shown  In  Figure 
IV-J-2. 


Figure  IY*J-2 

Example  of  Checklist  Pertaining  to 
Equipment  Problems 

I  will  name  equipment  from  the  LAVM/RV  that  you  may  have  used  to 
extract,  replace  and  transport  equipment.  Please  answer  Yes  or  No 
to  Indicate  whether  or  not  you  experienced  any  difficulties  using 
the  equipment.  I  would  also  appreciate  your  comments  concerning 
the  difficulties.  If  you  have  no  experience  using  the  equipment, 
then  check  Not  Applicable  (NA). 

Equipment  Yes  No  NA  Comment 

1.  Crane  _  _  _  _ 


2.  Crane  remote  controls 


3.  Crane  onboard  controls 


4.  Winch 


5.  Winch  controls 


This  checkllst/intervlew  could  serve  as  the  foundation  for  gener" 
atlng  other,  more  refined  Instruments.  The  checkllst/intervlew  Is 
another  way  of  eliciting  Information  from  a  subject  matter  expert 
group. 

Compared  to  rating  scales,  which  give  a  numerical  value  to  some 
sort  of  judgment,  checklists  are  relatively  crude.  They  are, 
however,  quite  useful  when  scaled  Information  Is  not  needed. 
Checklists  also  are  useful  when  Information  Is  needed  to  determine 
which  of  several  Issues  are  significant  to  a  respondent.  Other 
Issues  regarding  the  use  of  checklists  are  as  follows: 

a.  Checklists  should  use  terms  like  the  respondent  uses. 

b.  Response  set  can  be  somewhat  controlled  If  the  respondent  Is 
asked  to  check  a  stated  number  of  Items,  or  If  upper  or  lower 
limits  are  set. 


c.  There  Is  some  evidence  that  a  higher  rate  of  claim  or  assertion 
Is  obtained  from  checklists  than  from  open-ended  Items. 


56 


IV-J  Page  3 
8  Mar  85 
(t.  1  Jul  76} 


d.  It  Is  usually  not  known  If  checklists  cover  the  appropriate 
attributes. 

e.  Adjective  checklists  are  sonetimes  used,  especially  to  elicit 
stereotypes  about  people  or  nations.  They  are  slallar  to 
rating  scales. 

2.  Matching  Items 


With  matching  Items,  the  respondent  Is  given  two  columns  of  Items, 
and  Is  asked  to  pair  each  Item  In  the  first  colusn  with  an  asso¬ 
ciated  Item  In  the  second.  In  general.  It  Is  not  desirable  to  have 
the  ume  number  of  Items  In  each  colimm.  Both  sets  of  Items  should 
constitute  a  homogeneous  set,  and  any  Item  In  the  second  column 
should  look  like  It  could  go  with  any  Item  In  the  first  coliann. 

Matching  Items  are  best  used  In  achievement  testing.  Since  they 
have  little  utility  In  Army  field  test  evaluations,  they  are  not 
discussed  In  greater  detail  In  this  manual. 

3.  Arrangement  Items 

With  an  arrangement  Item,  a  number  of  statements  are  presented  In 
random  order,  and  the  respondent  arranges  them  In  a  new  order 
according  to  his/her  Judgment  and  the  guidance  received.  For 
example,  steps  In  a  sequence  of  events  or  procedures  may  be  re¬ 
arranged  In  order  of  occurrence  or  performance.  Or,  causes  may  be 
rearranged  In  order  of  Importance  In  bringing  about  a  certain 
effect. 

There  may  be  some  situations  where  arrangement  Items  may  be  useful 
In  Army  field  test  evaluations;  however,  the  scoring  of  the  Items 
Is  difficult.  The  use  of  such  Items  Is,  therefore,  extremely 
limited. 

4.  Formats  Providing  for  Supplementary  Responses 

The  qeustlonnaire  writer  Is  not  limited  to  the  major  Item  formats 
described  In  this  chapter.  Formats  providing  for  supplementary 
responses  can  also  be  used.  Examples  are  shown  In  Figure  IV-J-3. 


57 


IV-J  Page  4 
8  Mar  35 
(s.  1  Jul  76) 


.  Figure. IV-J-3 

Examples  of  Formats  Providing  for  Supplementary  Responses 
1.  The  starlight  scope  is  able  to  detect  aggressor  movements: 
_____  very  effectively. 

_  effectively. 

___  borderline. 

___  Ineffectively. 

_____  very  ineffectively. 

Explain:  ■ 


2.  What  style  of  leadership  was  used  by  the  most  effective  squad 
leader  you  served  under?  ( Check  one) 

____  democratic  and  friendly 

_____  friendly  with  most;  authoritarian  with  the  others 

______  sometimes  authoritarian;  sometimes  acts  like  one  of  the 

men 

____  usually  authoritarian;  avoided  making  close  friends 
_  other  (please  describe)  _ _ _ 


Notice  that  the  "other"  response  alternative  in  Example  2  allows 
the  respondent  in  effect  to  make  an  open-ended  item  out  of  a  muT 
tiple  choice  item.  Few  test  respondents,  however,  elect  to  do 
this.  Inclusion  of  the  supplementary  or  write-in  option  commits 
you  to  ex'^'^a  data  reduction  and  analysis  effort  that  would  have 
been  unnecessary  had  you  anticipated  and  included  all  reasonable 
response  alternatives. 


58 


V-A  Page  1 
8  Mar  85 
(s.  I  Jul  76) 


Chapter  V;  Attitude  Scales  and  Scaling  Techniques 

A.  Overview 

At  times,  the  questionnaire  developers  will  wish  to  treat  the  total 
group  of  Items  on  a  questionnaire  as  a  single  measuring  scale,  and  from 
them  obtain  a  single  overall  score  on  whatever  they  are  Interested  In 
measuring.  This  Is  a  common  practice,  especially  with  the  measurement 
of  attitudes.  A  typical  attitude  scale  Is  composed  of  a  number  of 
questions/statements  selected  and  put  together  from  a  much  larger 
number  of  questions/statements  according  to  certain  statistical  pro¬ 
cedures.  Some  of  these  procedures,  called  scaling  techniques,  are 
discussed  In  this  chapter. 

A  distinction  is  needed,  however,  between  two  ways  In  which  the  term 
scale  Is  used  In  this  manual.  An  atti tude  scale  could  be  constituted 
of  Items  each  one  of  which  employs  a  response  scale.  Aspects  of  re¬ 
sponse  scales  are  discussed  In  Chapter  Vlt  on  'Response  Anchoring.”  A 
component  of  score  could  be  achieved  on  each  Item.  Adding  these  Item 
scores  together  -  which  means  considering  the  whole  set  of  Items  as  a 
scale  -  produces  a  total  attitude  score  for  tne  inaiviauai  '‘espondent. 

There  are,  generally  speaking,  two  general  methods  for  the  construction 
of  scales  such  as  attitude  scales.  The  first  method  makes  use  of  a 
Judging  group  and  one  of  the  psychological  scaling  methods  developed  by 
Thurstone,  as  discussed  In  Section  Y-8.  It  results  In  a  set  of  state¬ 
ments  being  assigned  scale  values  on  a  psychological  continuum.  The 
continuum  may  be  favorableness-unfavorableness,  llke-dlsllke,  or  any 
other  Judgment.  The  psychological  scaling  methods,  therefore,  have 
considerably  greater  application  than  for  the  scaling  of  attitudes. 

They  can  be  used  to  scale  statements  or  objects.  They  have  been  used, 
for  example,  to  determine  the  perceived  favorableness  of  words  and 
phrases  cocmnonly  used  as  rating  scale  response  alternatives,  as  dis¬ 
cussed  In  Chapter  VIII. 

The  second  general  method  Is  based  on  the  direct  responses  of  agreement 
or  disagreement  with  attitude  statements  and  does  not  result  In  a  set 
of  statements  being  assigned  scale  values  on  a  psychological  continuum. 
Both  the  Likert  and  Guttman  scales  discussed  In  Sections  V-C  and  v-0 
are  examples  of  this  latter  method. 

For  Information  (relating  to  attitude  scaling  and  scaling  techniques) 
beyond  that  contained  In  this  manual,  the  following  references  may  be 
consulted. 

1.  Babbitt,  B.  A.,  i  Nystrom,  C.  0.  (1985).  Training  and  human 

factors  research  on  military  systems.  Questionnaires:  Literature 
survey  and  bibliography.  Fort  Hood,  tX:  Army  Research  Institute 
for  the  behavioral  and  Social  Sciences. 


59 


3. 


4. 


5. 


6. 


7. 


8. 


9. 


10. 


11. 


12. 

13. 


14. 


¥-A  Page  2 
8  Har  85 
($.  1  Jul  76) 

Church,  F.  (1983,  June).  Questionnaire  consyuctlon  manual  for 
operational  tests  and  evaluation.  Prepared  for  the  Deputy  Con- 
under  oi  Tactics  and  test,  5>th  Fighter  Weapons  Wing/OT,  Tactical 
Fighter  Weapons  Center  (TFWC),  Mellis  AFB,  NV. 

Edwards,  A.  L.  (1957).  Techniques  of  attitude  scale  construction. 
Mew  York:  Appleton-Century-Crofts.  ' 

Eggeneier,  F.  T.,  Crabtree,  M.  S.,  i  La  Point,  P.  A.  (1983,  Octo¬ 
ber).  The  effect  of  delayed  report  on  subjective  ratings  of 
■ental  workload.  Proceedings  of  the  Human  Factors  Society  27th 
Annual  Meeting.  13^-143. 

Eggeneier.  F.  T..  Crabtree.  M.  S.,  Zingg,  J.  J.,  Reid,  G.  B.,  4 
Shingledecker,  C.  A.  (1982).  Subjective  workload  assessment  In  a 
unory  update  task.  Proceedings  of  the  Human  Factors  Society  26th 
Annual  Meeting.  643-6^7: 

Eggemeier,  F.  T..  McGhee,  J.  Z.,  4  Reid,  G.  B.  (1983,  Hay).  The 
effects  of  variations  in  task  loading  on  subjective  workload 
rating  scales.  Proceedings  of  the  IEEE  1983  Mational  Aerospace 
and  Electronics  Conference.  Davton.  OH.  1099-1105. 

Guilford,  J.  P.  (1954).  Psychometric  methods  (2nd  ed.).  New 
York;  McGraw-Hill. 


Gulliksen,  H.,  4  Messick,  S.  (Eds.)  (1969).  Psychological  seal 
ing;  Theory  and  applications.  Hew  York:  John  Wiley. 


Lemon,  N.( 1974).  Attitudes  and  their  measurement.  New  York: 
John  Wiley. 


Mclver,  J.  P.,  4  Carmines,  E.  G.  (1981).  Unidimensional  scaling. 
Sage  University  Paper  series  on  quantitative  applications  in  the 
social  sciences,  07-024.  Beverly  Hills  and  London:  Sage  Pub¬ 
lishers. 


uestionnaires 


Moroney,  W.  F.  (1984).  The  use  of  checklists  add 
during  system  and  equipment  test  and  evaluation]  Shrivenham, 
England:  NATO  Defense  Research  Group  Panel  VIII  Workshop,  Appli¬ 
cations  of  Systems  Ergonomics  to  Weapon  System  development.  Royal 
Military  College  of  Science,  Vol  1,  C-59-C-68.  | 


Nunnally,  J,  C.  (1967).  Psychometric  theory.  Mew  York:  McGraw- 
Hill.  — 


Thurstone,  L.  L.  (1959).  The  measurement  of  values.  Chicago: 
University  of  Chicago  Press.  i 


Torgerson,  W.  S.  (1958). 
York:  John  Wiley. 


Theory  and  methods  of 


scaling. 


New 


I 


I 

I 


60 


B.  Thurstone  Scales 


V-B  Page  1 
B  Mar  85 
(s.  1  Jul  76) 


This  section  discusses  three  scaling  methods  developed  by  L.  L.  Thur¬ 
stone.  Thurstone  Investigated  rank  order  scales  and  how  to  compare 
psychological  variables.  He  developed  the  law  of  comparative  Judgment 
with  an  underlying  assumption  that  the  degree  to  which  any  two  stimuli 
can  be  discriminated  Is  a  direct  function  of  the  difference  In  their 
status  as  regards  the  attribute  In  question.  Thurstone  generated  three 
new  scaling  methods  based  on  his  law  of  comparative  Judgment.  The 
three  scaling  methods  are  known  as  equal  appearing  Interval s,  paired- 
comparison,  and  successive  Intervals.  For  additional  detail,  see  the 
texts  refenred  to  In  Section  V-A. 

1.  Method  of  Equal  Appearing  Intervals 

Thurstone' s  method  of  equal  appearing  Intervals  assumes  that  a 
group  of  statements  of  opinion  about  a  particular  Issue  could  be 
ordered  on  a  continuum  of  favorableness-unfavorableness,  and  that 
the  ordering  could  be  such  that  there  appears  to  be  an  equal  dis¬ 
tance  between  the  adjacent  statements  on  the  continuin. 

The  following  steps  are  followed  In  the  method  of  equal  appearing 
Intervals: 

a.  From  the  literature  or  pilot  Interviews,  a  large  number  of 
statements  (100  to  200)  are  compiled  about  the  attribute  or 
object  of  an  attitude  under  study.  Irrelevant,  ambiguous,  or 
poorly  worded  statements  would  not  be  selected. 

b.  A  number  of  Judges,  at  least  50,  are  obtained.  They  should  be 
similar  to  those  Individuals  who  will  respond  to  the  final 
statements  on  the  questionnaire.  The  Judges  Independently  sort 
each  statement  Into  one  of  11  piles.  The  first  pile  Is  defined 
as  "Unfavorable”  or  "Most  unfavorable,”  the  middle  or  sixth 
pile  Is  defined  as  "Neutral,"  and  the  eleventh  pile  Is  defined 
as  "Favorable"  or  "Most  favorable."  The  other  piles  are  left 
undefined.  The  Judges  are  told  that  the  Intervals  between 
piles  or  categories  are  to  be  regarded  as  subjectively  equal. 
They  are  also  Instructed  to  Ignore  their  own  agreement  or 
disagreement  with  each  Item,  and  to  Judge  each  Item  In  terms  of 
Its  degree  of  favorableness-unfavorableness. 

c.  The  scale  value  for  each  Item  Is  usually  determined  by  com¬ 
puting  Its  mean  or  median,  over  all  Judges. 

d.  Twenty  to  25  statements  with  little  dispersion  In  their  scale 
values  are  then  selected  for  use.  The  statements  are  selected 
so  that  the  Intervals  between  statements'  scale  values  are 
approximately  equal  and/or  are  relatively  equally  spaced  on  the 
psychological  continuum. 


61 


¥-B  Page  2 
8  Mar  85 
(s.  1  Jul  76) 

e.  The  finally  selected  statements  are  usually  placed  In  random 
order  for  presentation  to  respondents.  The  respondents  are 
asked  to  Indicate  which  statements  they  agree  with,  and  which 
they  disagree  with. 

f.  The  respondent's  score  Is  the  mean  or  median  scale  value  of 
those  statements  for  which  he/she  marked  "Agree." 

Some  considerations  for  use  of  the  Equal  Appearing  Intervals  method 

are: 

a.  The  method  of  equal  appearing  intervals  Is  designed  to  provide 
an  Interval  scale  as  Its  output.  The  scale  Is  at  least  ordinal 
(ranked). 

b.  The  method  Is  useful  when  there  are  a  large  number  of  state* 
ments  Involved. 

c.  Scale  values  from  widely  differing  groups  of  Judges  appear  to 
correlate  highly  with  one  another  so  long  as  Judges  with  ex¬ 
treme  views  are  eliminated. 

d.  Graphic  or  nuaerlcal  rating  scales  can  be  used  by  the  Judges 
Instead  of  having  the  statements  sorted  Into  piles.  Though  11 
categories  arc  usually  used,  some  other  nunber  can  be  employed. 

e.  There  have  been  some  psychonetric  questions  about  the  unldlmen- 
slonallty  of  Thurstone  scales.  Even  though  research  has  been 
mixed  as  to  which  scaling  methods  are  best,  there  Is  some 
evidence  that  Likert  and  Guttman  scales  may  be  sounder.  Actual 
scale  format  does  not  seem  to  be  as  Important  as  the  actual 
developmental  procedures  In  the  construction  of  the  scale. 

2.  The  Method  of  Paired  Comparisons 


Thurstone  developed  a  procedure  for  deriving  an  Interval  scale 
based  upon  what  has  been  called  the  Law  of  Comparative  Judgment. 
Basically,  It  Is  a  method  by  which  statements  such  as  "A  Is  strong¬ 
er  than  8,"  "8  Is  stronger  than  C,"  etc.,  are  used  to  provide  a 
scale  with  Interval  properties.  The  objects  or  statements  to  be 
ranked  are  presented  two  at  a  time,  and  the  respondent  Is  asked  to 
choose  between  them.  All  possible  combinations  of  pairs  have  to  be 
presented.  Hence  the  procedure  becomes  very  cumbersome  when  there 
are  more  than  15  or  so  Items.  The  determination  of  scale  values  Is 
also  laborious.  Since  the  procedure  Is  not  used  much  In  applied 
research,  additional  detail  Is  not  presented  here. 

3.  The  Method  of  Successive  Intervals 


The  method  of  successive  Intervals  Is  similar  to  the  method  of 
equal  appearing  Intervals.  However,  no  assumption  Is  made  con¬ 
cerning  the  psychological  equality  of  the  category  Intervals. 


62 


/ 


V-B  Page  3 
8  Mar  85 
(s.  L  Jul  76) 

It  Is  only  assumed  that  the  categories  are  In  correct  rank  order 
and  tnat  their  boundary  lines  are  relatively  stable.  Tte  procedure 
Involves  estimating  the  widths  of  the  categories  along  the  psycho¬ 
logical  continuum.  From  these  reference  points,  the  scale  values 
of  the  statements  can  be  obtained.  Research  has  shown  that  there 
Is  a  linear  relationship  between  scales  constructed  by  the  method 
of  paired-comparisons  and  by  the  method  of  successive  Intervals. 

4.  New  Applications  for  Thurstone  Scales 

When  Thurstone  developed  the  law  of  comparative  Judgment,  his 
scaling  techniques  were  considered  a  major  advancement.  Thurstone 
scales  continue  to  be  used  In  survey  research,  although  other 
scaling  methods  have  gained  popularity,  such  as  Likert  and  Guttman 
scales.  There  have  been  Instances  when  rank  order  scaling  proce¬ 
dures  have  been  Integrated  Into  other  complex  systems.  An  Illus¬ 
tration  of  this  Is  the  delta  scalar  method  used  by  the  U.S.  Navy 
and  the  Air  Force  Aerospace  Medical  Research  Laboratory.  The  delta 
scalar  method  Is  a  complex  system  of  rank  ordering  found  In  the 
Mission  Operability  Assessment  Technique  and  Systems  Operability 
Measurement  Algorithm  (U.S.  Navy,  and  the  Subjective  Workload 
Assessment  Technique  (U.S.  Air  Force).  These  systems  Involve 
establishing  a  rank  order  scale  that  Is  converted  to  an  Interval 
scale.  More  research  will  be  required  to  determine  how  functional, 
reliable,  and  valid* these  new  procedures  will  be.  The  procedures 
for  embedding  rank  order  methods  Into  other  scales  Is  complicated 
and  beyond  the  scope  of  this  manual. 


\ 


03 


V-C  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


C.  Likert  Scales 

The  Likert  method  of  scale  construction  was  developed  because  the  Thur- 
stone  procedures  require  extensive  work  and  make  assumptions  regarding 
the  Independence  of  Item  statements.  The  Likert  method  assunes  that 
all  statements  reflect  the  same  attitude  dimension  and  are  hence  re¬ 
lated  to  each  other.  The  Likert  approach  does  not  assume  equal  Inter¬ 
vals  between  the  scale  values.  It  Is  sometimes  called  the  method  of 
summated  ratings. 

The  steps  In  Likert  scale  construction  are  as  follows: 

1.  Item  Construction 

Design  an  Initial  set  of  Items  to  measure  an  attribute.  Statements 
are  classified  In  advance  as  "Favorable*  or  "Unfavorable.*  Ho 
attempt  Is  made  to  find  an  equal  distribution  of  statements  over 
the  whole  range  of  the  attitude  of  concern,  and  no  attempt  Is  made 
to  scale  the  statements. 

2.  Item  Se'*ect1on 


Likert  proposed  the  use  of  correlation  analyses  and  analyses  based 
on  the  criterion  of  Internal  consistency  to  evaluate  the  ability  of 
Individuarl  Items  to  measure  an  attribute. 

a.  A  pretest  Is  conducted.  In  the  pretest,  the  respondents  Indi¬ 
cate  their  degree  of  agreement  with  every  statement,  usually 
using  five  response  alternatives:  strongly  agree,  agree, 
undecided,  disagree,  and  strongly  disagree.  Each  descriptor  Is 
assigned  a  ntnerlcal  weight  (e.g.,  4,  3,  2,  1,  0)  usually  based 
on  a  given  series  of  Integers  In  arlttnetlcal  sequence.  Each 
respondent  Is  assigned  a  score  that  represents  the  summation  of 
weights  associated  with  each  Item  checked. 

b.  Criterion  of  Internal  consistency  compares  the  difference 
between  mean  responses  to  an  Individual  Item  compared  to  high 
and  low  subgroups.  Subgroups  consist  of  25Z  of  the  respondents 
at  each  extreme  of  the  scale. 

c.  The  criterion  of  Internal  consistency  Includes  differences  In 
subgroup  size  and  different  distributions  of  responses  between 
subgroups. 


64 


.  V-C  Pige  2 
8  Mar  85 

d.  The  t  test  provides  an  accurate  Indication  of  the  degree  to 
which  an  item  differentiates  between  high  and  low  subgroups. 

X  ■  Mean  Item  response  of  subgroup 
$3  •  Item  variance  of  subgroup 
B  "Size  of  subgroup 

e.  The  criterion  of  Internal  consistency  analysis  and  the  eorre* 
latlon  analysis  may  lead  to  different  conclusions  regarding  the 
selection  of  Items.  It  Is  recoonended  that  both  types  of  Item 
analyses  be  used  to  assist  In  determining  which  Items  to  retain. 

f.  Correlational  analysis  focuses  on  how  strongly  the  Item  Is 
related  to  the  total  scale  score. 

a  - 1)  •  <'iT  «T  “ 

'it  >  correlation  between  Item  and  total  score 
^  ■  standard  deviation  of  the  total  score 
•  standard  deviation  of  the  Item  score 

The  greater  the  number  of  Items,  the  less  each  Item  will  con* 
tribute  to  the  variance  of  the  scale.  Each  Item  will  contri¬ 
bute  more  bias  for  scales  that  have  only  a  few  Items. 

g.  Each  Item  Is  treated  as  a  predictor  of  the  respondent's  total 
score.  Items  with  low  Item-to-total  correlations  should  be 
eliminated  from  the  scale.  Items  that  do  not  discriminate 
between  groups  with  extreme  attitudes  (25X  of  the  respondents 
at  each  extreme  of  the  scale)  should  be  eliminated.  This 
procedure  leaves  us  with  the  Items  that  will  comprise  the  final 
score. 

3.  Item  Scoring 

a.  Calculate  scale  scores  by  summing  the  response  scores  for  each 
Item  given  the  following  values.  Favorable  statements  receive 
a  value  of  4  for  "Strongly  agree"  and  a  value  of  3  for  "Agree." 
The  midpoint  response  alternative  "Undecided"  receives  a  value 
of  2.  Unfavorable  statements  receive  a  value  of  1  for  "Dis¬ 
agree"  and  a  value  of  0  for  "Strongly  disagree.”  High  scores 
always  Indicate  a  favorable  attitude,  and  low  scores  always 
Indicate  an  unfavorable  attitude. 


65 


V-C  Page  3 
8  Mar  85 
(s.  1  Jul  76} 

b.  Interpretation  of  Individual  scoring  Is  defined  relative  to  the 
group.  Each  of  the  Individual  attitude  scores  Is  expressed  as 
a  deviation  from  the  mean  of  the  group.  The  score  of  any 
Individual  relative  to  the  mean  of  the  group  Is: 

X-T 

X  ■  Individual  score 
T  ■  group  mean 

The  scores  are  converted  Into  1  scores  by  dividing  each  Individual 
score  by  the  standard  deviation  of  the  sample.  A  2  score  will 
Identify  the  position  of  the  respondent’s  score  In  relation  to  the 
mean  of  the  distribution.  Using  the  curve  as  a  distribution  of 
observations,  the  2  score  can  describe  the  location  of  the  score 
along  the  horizontal  axis.  A  2  score  distribution  maintains  the 
same  shape  as  the  set  of  raw  scores  from  which  It  was  derived. 


2  scores  Indicate  how  many  standard  deviations  the  score  I'-^s  above 
or  below  the  mean.  The  mean  Is  always  zero,  and  the  standard 
deviation  of  any  set  of  2  scores  Is  always  1.  2  scores  can  be  used 
to  compare  scores  from  different  distributions  so  Ijong  as  the 
distributions  have  approximately  the  same  shape. 

4.  Reliability  of  the  Summated  Scale 

To  compute  the  reliability  of  the  Likert  scale,  the  coefficient 
alpha  Is  recommended. 

•  •N?/[l+A(N-l)] 


N  ■  number  of  Items 
f  ■  mean  Irterltem  correlation 

The  alpha  coefficient  provides  an  estimate  of  reliability  based  on 
the  Interitem  correlation  matrix. 

Factors  to  be  taken  Into  consideration  when  deciding  whether  to  use 
Likert  scales  Include: 

1.  Likert  scales  take  less  time  to  construct  than  Thurstone  scales. 
They  are  one  of  the  most  widely  used  scales  for  attitude  surveys. 

2.  It  Is  possible  to  construct  scales  by  the  Likert  and  Thurstone 
methods  which  will  yield  comparable  scores. 


66 


V-C  Page  4 
8  Har  85 
(s.  1  Jul  76} 

3.  Likert  scales  have  only  ordinal  properties.  If  there  Is  a  large 
dispersion  about  a  respondent's  mean  score,  however,  even  those 
properties  have  limited  meaning.  If  the  sole  purpose  of  a  scaling 
procedure  Is  to  rank  respondents  according  to  the  d^ree  to  which 
they  hold  some  attitude,  then  Likert  scales  are  efficient  because 
of  their  ease  of  administration. 

4.  In  addition  to  lacking  metric  properties,  Likert  sumuted  scores 
lack  a  neutral  point.  The  Interpretation  of  a  score  cannot  be  made 
Independently  of  the  distribution  of  scores  of  some  defined  group. 
Only  the  sinmatlon  of  the  Items  measure  the  attitude.  Percentile 
or  deviation-type  norms  can  be  calculated  If  the  sample  size  Is 
large  enough. 

5.  For  the  same  number  of  Items,  scores  from  Likert  scales  may  be  more 
reliable  than  scores  from  Thurstone  scales. 

6.  Likert  and  Guttman  scales  both  appear  to  be  superior  to  Thurstone 
scales. 


67 


Y-D  Page  1 
8  Mar  85 
(s.  1  Jul  76) 

D.  Guttnan  Scales 

I 

Guttman  scaling  was  developed  as  an  alternative  to  Thurstone  and  Likert 
methods  of  attitude  scaling.  Guttman's  approach  to  scaling  Is  known  as 
scalogram  or  scale  analysis.  It  Is  a  deterministic  model;  It  considers 
Its  scales  are  close  to  being  rulers-oeasures  of  length.  The  essence 
of  the  method  Is  to  determine  whether  a  series  of  statements  can  be 
appropriately  scaled.  An  attempt  Is  made  to  Identify  a  set  of  state* 
ments  which  actually  reflect  a  unidimenslonal  scale  and  have  a  cunula* 
tive  nature.  When  the  goal  Is  achieved*  two  or  more  persons  receiving 
the  same  score  will  have  responded  In  the  same  way  to  all  of  the  state* 
ments. 

As  an  example,  the  following  four  questions  comprise  a  Guttman  scale: 


a.  The  United  Nations  Is  mankind's  savior 

b.  The  United  Nations  Is  our  best  hope  for  peace 

c.  The  United  Nations 'is  a  constructive  force  In  the 
world 

d.  We  should  continue  our  participation  In  the 
United  Nations 


The  expected  pattern  of  responses  to  these  questions  Is  "triangular." 

Person 

Item  1  23  4  Scale  Score 

ax  1 

b  X  X  2 

c  XXX  3 

d  X  X  X  X  4 

This  means  that,  for  persons  who  answers  yes  to  Item  "a,"  there  Is  a 
high  probability  that  they  will  answer  yes  to  the  othe.'  Items.  A 
person  who  says  no  to  "a"  but  yes  to  "b"  has  a  high  probability  of 
answering  yes  to  the  other  Items,  and  so  on.  The  model  anticipates 
that  the  perfect  relationship  between  the  scale  score  and  the  Item 
score  will  be  violated.  The  degree  of  deviation  that  Is  acceptable  Is 
established  by  criteria,  and  measured  by  a  coefficient  of  reproducibll* 
Ity. 


68 


VH)  Page  2 
8  Mar  85 
(s.  1  Jul  76) 


Guttnan  scaling  Is  considered  psychonetrically  more  robust  than  Likert 
or  Thurstone  scaling.  The  coefficient  of  reproducibility  (CR)  rould  be 
used  to  evaluate  the  degree  of  scalability  of  empirical  data.  The 
Guttman  model  calls  for  assigning  scale  scores  only  when  the  coeffi* 
dent  of  reproducibility  (CR)  Is  greater  than  .90.  The  formula  Is  as 
follows: 


CR  ■  1.0  -  (#  errors)/total  responses 

•  1.0  -  (#  errors)/[(l  Items)  x  (#  respondents)] 

For  example,  a  .'espondent  who  rates  three  Items  positively  out  of  n 
Items  composing  the  Guttman  scale  would  be  considered  to  have  responded 
to  three  specific  Items  which  would  be  considered  the  three  Items  most 
acceptable  to  the  population  of  respondents.  The  Interpretation  of  a 
response  to  three  Items  on  a  Likert  scale  would  be  that  the  respondent 
had  rated  favorably  any  three  Items  of  n  st  .'mull. 

The  major  steps  In  scalogram  analysis  are  too  complex  to  sumsarlze 
here,  but  are  found  In  some  of  the  references  In  Section  V-A.  Pro¬ 
cedures  are  available  for: 

1.  Measuring  the  amount  of  error  due  to  Imperfect  scalability. 

2.  Ordering  the  statements  so  that  the  response  patterns  provide  the 
least  amount  of  error. 

3.  Determining  the  extent  to  which  the  data  approximate  the  perfect 
case. 

4.  Improving  the  scalability  of  the  statements  via  category  combina¬ 
tions,  statement  discarding,  etc. 

There  have  been  many  critics  of  scalogram  analysis.  Some  feel  that 
there  Is  no  really  effective  way  of  selecting  good  Items  by  this  ap¬ 
proach.  However,  the  procedure  Is  considered  useful  If  one  Is  con¬ 
cerned  with  unidimenslonallty  or  If  one  wishes  to  examine  small  changes 
In  attitudes.  Guttman  scaling  Is  primarily  used  in  the  construction  of 
attitude  surveys  as  well  as  In  the  construction  of  mixed  standard 
scales.  It  may  be  possible  to  construct  other  mixed  standard  scales 
for  surveys  that  measure  other  factors  In  addition  to  job  performance. 
It  Is  laborious  to  construct  Guttman  scales-  No  Instances  of  pauc  use 
In  field  testing  situations  are  known. 

Even  though  Guttman' s  approach  to  scale  analysis  has  not  been  used  In 
field  testing  situations.  It  Is  being  used  by  the  armed  services  for 
other  applications. 


G9 


Y-D  Page  3 
8  Har  85 

Adaptive  testing  Is  based  on  a  Guttman  method  of  scaling  and  adaptive 
testing  Is  being  Investigated  by  the  armed  services.  The  Armed  Ser¬ 
vices  Vocational  Aptitude  Battery  is  being  developed  for  computer- 
adaptive  testing  by  the  Navy  Personnel  Research  and  Development  Center. 
Each  time  a  question  Is  asked,  there  Is  a  recalculation  of  probabili¬ 
ties  so  that  the  next  Item  selected  Is  based  on  the  subject's  response 
to  the  previous  Item.  This  allows  for  estimating  the  respondent's 
future  performance  level  as  a  way  to  select  the  next  Item.  The  Items 
are  administered  on  a  commputer,  and  each  respondent  receives  a  dif¬ 
ferent  set  of  questions. 

Adaptive  testing  requires  a  large  sample  for  Its  development.  It  has 
been  primarily  used  as  an  ability  test  with  multiple  choice  questions. 
There  have  been  other  types  of  applications  such  as  for  Interviewing. 
The  armed  forces  are  a  leader  In  adaptive  testing.  Even  so,  currently, 
this  model  does  not  appear  to  be  viable  for  OTSE  because  of  the  large 
samples,  and  the  lead  time  required  for  development. 


70 


V-E  Page  1 
1  Jul  76 


E.  Other  Scaling  Techniques 

Nunerous  other  scaling  techniques  and  combinations  of  methods  are 
reported  In  the  literature.  A  discussion  of  them,  however.  Is  outside 
the  current  scope  of  this  manual. 


I 


njlx. 


VI-A  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


Once  a  decision  has  been  made  regarding  the  type  or  types  of  Items  that 
are  to  be  used  In  a  questionnaire  (see  Chapter  IV),  attention  must  be 
given  to  the  actual  development  of  the  Items.  This  chapter  addresses 
the  following  development  topics:  mode  of  questionnaire  Items;  wording 
of  Items  for  both  question  stems  and  response  alternatives;  difficulty 
of  Items;  length  of  question  stem;  order  of  question  stem;  number  of 
response  alternatives;  and  order  of  response  alternatives.  The  related 
topic  of  response  anchoring  Is  considered  In  Chapter  VII. 


As  used  In  this  manual,  a  distinction  has  been  made  between  a  question¬ 
naire  Item,  a  question  stem,  and  response  alternatives.  A  question¬ 
naire  Item  has  both  a  question  stem  and  response  alternatives.  The 
response  alternatives  are  the  answer  choices  for  the  question.  (They 
are  sometimes  called  "options.”)  The  question  stem  Is  that  part  of  the 
Item  that  comes  before  the  response  alternatives. 


73 


/ 


VI-B  Page  1 
8  Mar  76 
(s.  1  Jul  76} 


Mode  of  Items 


Questionnaire  Items  are  usually  presented  to  a  respondent  In  printed 
form.  However,  It  Is  possible  to  present  Items  or  stimuli  pictorlally. 
There  Is  some  evidence  that  there  are  no  significant  differences  In 
subjects'  responses  to  verbal  and  pictorial  formats.  The  evidence  Is 
conflicting,  since  anchoring  endpoints  with  pictorial  anchors  for  bi¬ 
polar  scales  has  proven  difficult  In  establishing  meaning.  Researchers 
were  not  able  to  verify  that  the  pictorial  anchors  were  actually  anto¬ 
nyms.  This  could  affect  the  bipolar  assumptions  of  the  scales.  Using 
a  pictorial  format  may  facilitate  obtaining  responses  from  respondents 
with  limited  verbal  comprehension,  who  might  have  difficulty  responding 
to  questions  employing  lengthy  definitions  of  concepts  or  objects.  If 
pictures  are  used,  they  should  be  pretested  for  clarity  of  their  pre¬ 
sentation  of  the  concept  or  object  to  be  evaluated. 

For  group  administration  of  a  questionnaire  with  pictorial  anchors.  It 
would  be  possible  to  use  color  slides  and  rating  forms  with  replicas  of 
the  slides.  In  cases  where  It  Is  known  that  the  respondents  have  very 
low  reading  ability.  It  may  be  desirable  to  present  the  questionnaire 
orally.  A  tape  player-recorder  may  be  used  for  this  purpose  also. 


74 


VI-C  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


C.  Wording  of  Items 

The  wording  of  questionnaire  Itens  Is  a  critical  consideration  In 
obtaining  valid,  relevant,  and  reliable  responses.  Consider,  for 
example,  the  following  three  questions  that  were  administered  by  Payne 
Uee  reference  below)  to  three  matched  groups  of  respondents: 

a.  "Do  you  think  anything  should  be  done  to  make  ft  easier  for  people 
to  pay  doctor  or  hospital  bills?" 

b.  "Do  you  think  anything  could  be  done  to  make  It  easier  for  people 
to  pay  doctor  or  hospital  bills?* 

c.  "Do  you  think  anything  might  be  done  to  make  It  easier  for  people 
to  pay  doctor  or  hospital  bfl Is?" 

These  questions  differed  only  In  the  use  of  the  words  "should." 

"could."  or  "might."  terms  that  are  often  used  as  synonyms  even  though 
they  have  different  connotations.  The  percent  of  "Yes"  replies  to  the 
questions  were  82.  77.  and  63.  respectively.  The  difference  of  19t 
between  the  extremes  Is  probably  enough  to  alter  the  conclusions  of 
most  studies. 

A  number  of  matters  related  to  the  wording  of  questionnaire  Items  are 
considered  In  this  section.  Some  of  the  suggestions  made  are  based 
upon  experimental  research.  Others  are  based  upon  experience.  Intui¬ 
tion.  and  connonsense.  Several  sources  offering  principles  of  question 
wording  are: 

a.  Roslow.  S..  8  Blankenship.  A.  8.  (1939).  Phrasing  the  question  In 
consumer  research.  Journal  of  Applied  Psychology.  23.  612-622. 

b.  Jenkins.  J.  6.  (1941).  Characteristics  of  the  question  as  deter¬ 

minants  of  dependability.  Journal  of  Consulting  Psychology.  5. 
164-169.  -  ” 

c.  Blankenship.  A.  B.  (1942).  Psychological  difficulties  In  measuring 
consumer  preferences.  Journal  of  Marketing,  66-75. 

d.  Payne.  S.  L.  (1963).  The  art  of  asking  questions  (Rev.  ed.). 
Princeton.  NJ:  Princeton  University  Press.  ~ 

e.  Schuman.  H..  8  Presser.  S.  (1981).  Questions  and  answers  In  atti¬ 
tude  surveys:  Experiments  on  question  form,  wording,  and  context. 
New  Vork:  Academic  Prens.  Inc. 


/ 


75 


VI-C  Page  2 
1  Jul  76 


1.  Formulation  of  the  Question  or  Question  Stem 

a.  General  cwiments  regarding  items  and  question  stems.  Issues 
that  should  be  noteo  concerning  the  general  structure  of  ques¬ 
tions  and  question  stems  are: 

(1)  Question  stems  may  be  In  the  form  of  an  Incomplete  state¬ 
ment,  where  the  statement  1s  completed  by  one  of  the  re¬ 
sponse  alternatives,  or  In  the  form  of  a  complete  ques¬ 
tion.  See  Figure  VI-C-1  for  examples. 

Figure  VI-C-1 

Example  of  Question  Form  (Item  1)  and 
Incomplete  Statement  Form  (Item  2)  of  Stem 

1.  How  qualified  or  unqualified  for  their  jobs  are  most  Army  NCOs? 
(Check  one.) 

_____  Very  well  qualified 
_____  Qualified 
_____  Borderline 
_____  Unqualified 
_____  Very  unqualified 

2.  Check  one  of  the  following.  Most  Army  NCOs  are: 

_  Very  well  quallfed  for  their  jobs. 

Qualified  for  their  jobs. 

____  Borderline. 

_  Unqualified  for  their  jobs. 

____  Very  unqualified  for  their  jobs. 


The  choice  between  thes>;  two  methods  should  depend  on 
which  of  the  two  permits  simpler  and  more  direct  wording 
for  the  Item  in  question.  Not  all  of  the  items  in  a 
questionnaire  ne^  to  be  in  the  same  form. 


76 


VI-C  Page  3 
1  Jul  76 

(2)  All  questionnaire  Items  should  be  grammatically  correct. 

(3)  All  stems  should  be  as  neutrally  expressed  as  possible, 
and  the  respondents  should  be  permitted  to  Indicate/select 
the  direction  of  their  preference.  If  this  Is  not  done, 
the  stems  may  Influence  the  response  distribution.  If  the 
stems  cannot  be  expressed  neutrally,  then  alternate  forms 
of  the  questionnaire  should  be  used. 

(4)  Respondents  may  not  answer  an  Item  If  they  are  not  able  to 
give  the  Information  requested.  Therefore,  care  should  be 
exercised  In  the  wording  of  the  question,  so  that  It  does 
not  call  for  Information  not  possessed  by  the  respondents. 
If  the  respondent  Is  not  able  to  answer  the  Item,  the  re¬ 
sponse  option  should  permit  the  respondent  to  say  he/she 
"doesn't  know."  The  questionnaire  designer  should  have 
determined  during  pretesting  whether  a  "Don't  Know”  re¬ 
sponse  option  should  be  Included. 

b.  Accuracy  and  completeness  of  Question  stems. 

(1)  The  Stem  of  an  Item  should  be  accurate,  even  though  In¬ 
accuracies  may  not  Influence  the  selection  of  the  response 
alternative. 

(2)  The  question  stem,  In  conjunction  with  each  response 
alternative,  should  present  the  question  as  fully  as 
necessary  to  allow  the  respondent  to  answer.  It  should 
not  be  necessary  for  the  respondent  to  Infer  essential 
points.  An  example  of  an  Insufficiently  Informative 
question  stem  Is  given  as  item  I  In  Figure  VI-C-2.  It  Is 
Insufficient  In  that  no  specification  Is  given  as  to  who 
should  carry  the  scopes.  (The  response  alternatives  are 
also  Insufficient  since  the  respondent  Is  not  allowed  to 
say  "None.")  Two  or  three  questions  might  be  needed  to 
obtain  all  the  Information  desired.  Item  2  In  Figure 
VI-C-2  Is  one  revision  that  makes  the  question  stem  suffi¬ 
cient. 

(3)  Generally,  materials  which  are  cwnmon  to  all  response 
alternatives  should  be  contained  In  the  stem.  If  this  can 
be  done  without  the  need  for  awkward  wording. 

(4)  In  forming  questions  which  depend  on  respondents'  memory 
or  recall  capabilities,  the  time  period  a  question  covers 
must  be  carefully  defined.  The  "when"  shduld  be  speci¬ 
fically  provided. 


77 


YI-C  Page  4 
1  Jul  76 


Figure  VI-C-2 

An  Insufficiently  Detailed  Question  Stem.  Plus  Revision 

1.  Ho«  many  starlight  scopes  should  be  issued  to  a  rifle  squad? 
_ I 

_ 2 

_ 3 

_ 4 

_ 5 

2.  Place  a  check  in  front  of  each  squad  member's  "name”  below  that 
you  believe  should  be  issued  a  starlight  scope: 

_____  Squad  Leader  _____  Fire  Team  2  Leader 

___  Fire  Team  1  Leader  Automati c  Riflemam 

Automatic  Rifleman  Grenadier 

Grenadier  Rifleman 

_ Rifleman  Rifleman 


(5)  Question  stems  and  response  alternatives  should  be  worded 
so  that  it  is  clear  what  the  respondent  meant.  Consider 
the  question  "Should  this  cap  be  adopted,  or  its  alter¬ 
nate?"  If  the  respondent  answers  "Yes,"  it  would  still  be 
unclear  which  cap  ("this  cap"  or  "its  alternate")  should 
be  adopted. 

c.  Positive  versus  negative  wording. 

(1)  Alternative  wording  can  produce  demonstrable  effects  on 
survey  results. 

(2)  There  may  be  a  tendency  for  the  direction  of  the  question 
stem  to  be  chosen  in  the  response  alternative. 

(3)  Studies  have  indicated  that  it  is  usually  undesirable  to 
include- negatives  in  question  stems  (unless  an  alternate 
form  with  positives  is  also  used  for  half  of  the  respon¬ 
dents). 


78 


VI-C  Page  5 
8  Har  85 
(s.  1  JuT  76) 


(4)  Questions  worded  In  positive  terms  are  preferred  by  re> 
spondents  to  questions  In  negative  terms  (If  alternate 
forms  are  not  being  used).  Questions  worded  negatively 
•ay  be  confusing,  or  n  lative  words  may  be  overlooked. 

(5)  If  It  seems  necessary  to  have  a  particular  question  In 
negative  form,  the  negative  word  (e.g.,  not,  never)  should 
be  underlined  or  Italicized.  Care  should  also  be  taken 
that  there  are  no  double  negatives,  as  they  are  frequently 
misinterpreted. 

(6)  A  question  worded  In  negative  terms  can  often  be  Improved 
by  rephrasing  It  In  positive  terms. 

(7)  There  Is  evidence  to  Indicate  that  positively-worded  Items 
may  In  some  Instances  receive  higher  mean  responses  than 
negatively-worded  Items.  However,  these  findings  were  not 
statistically  significant.  There  are  conflicting  research 
results  where  positive  and  negative  wording  of  Items  did 
not  affect  the  responses. 

d.  Definite  versus  Indefinite  article  wording.  The  Indefinite 
articles,  "a"  or  "an,'"  would  be  used  in  a  question  such  as  “Did 
you  see  a  demonstration  of  the  new  night  vision  device?"  A 
comparabTe  question  using  the  definite  article  “the"  would  be, 
“Did  you  see  the  demonstration  of  the  new  night  vision  device?" 
There  Is  some  evidence  that  changing  from  “a®  to  “the"  reduces 
the  level  of  suggestibility  of  an  Item.  However,  there  Is  not 
enough  evidence  to  warrant  a  firm  conclusion. 

e.  First,  second,  and  third  person  wording.  An  example  of  a 
statement  written  In  the  first  person  is,  “Army  NCOS  are  under¬ 
standing  of  my  needs  and  problems."  A  statement  In  the  second 
person  Is,  “Army  NCOs  are  understanding  of  your  needs  and 
problems,"  while  one  In  the  third  person  Is,  “Army  NCOs  are 
understanding  of  the  needs  and  problems  of  their  men."  It  Is 
preferable  that  the  framework  of  questions  be  consistent  for 
all  questions  In  a  questionnaire,  so  that  responses  are  com¬ 
parable.  A  respondent's  opinion  of  events  affecting  his/her 
own  person  Is  oftr'  quite  different  than  his/her  opinions  of 
the  effects  of  the  same  events  on  others.  Hence,  questions 
written  In  the  first  or  second  person  may  elicit  entirely 
different  responses  than  the  “same”  question  written  In  the 
third  person. 

There  are  occasions  where  each  person  (first,  second,  or  third) 
Is  appropriate.  For  example,  the  third  person  should  probably 
be  used  when  It  Is  desired  to  elicit  Information  that  might  be 
considered  too  personal  for  a  person  to  answer  about  himself/ 
herself.  The  third  person  may  also  be  used  In  attempts  to 
elicit  Information  about  the  feelings  Inherent  In  a  minority  of 
respondents,  but  about  which  many  more  respondents  may  be 


79 


VI-C  Page  6 
8  Nar  85 
(s.  1  Jul  76) 

aware,  such  as  In  the  statement,  "The  Amy  Is  ahead  of  most 
areas  of  civilian  life  In  reducing  racial  discrimination."  In 
other  cases,  the  first  or  second  person  fora  Is  not  applicable, 
such  as  In  "The  Army  Is  essential  for  the  defense  of  the  coun¬ 
try."  Also,  the  use  of  the  third  person  permits  a  far  larger 
nuaber  of  personnel  to  answer  the  questions,  since  some  first 
person  questions  that  are  Inapplicable  to  many  Individuals 
become  applicable  when  In  the  third  person.  Instances  auy 
occur  where  respondents  are  asked  a  question  twice,  once  to 
discover  how  they  personally  feel  about  the  Issue  (using  first 
or  second  person),  and  then  to  discover  what  they  Judge  others' 
feelings  on  that  Issue  are  (using  the  third  person).  Some 
personally-worded  Items  may  be  perceived  as  more  specific  to 
the  experience  of  the  respondent.  This  may  possibly  provide 
results  that  have  greater  accuracy  for  Items  that  are  non¬ 
threatening.  Generally,  however,  the  use  of  the  third  person 
appears  preferable. 

f.  Loaded  and  leading  questions.  Loaded  and  leading  questions 
should  be  avoided.  Although  the  questionnaire  writers  may  not 
deliberately  attempt  to  distort  the  distribution  of  responses, 
they  may  sometimes  do  so  unintentionally. 

In  Figure  VI-C-3,  Item  1  should  be  revised  to  maintain  neutral¬ 
ity  b>  removing  the  adjectives  applied  to  the  rifles.  It  Is 
true  that  the  H16  weighs  less  and  fires  more  rounds  faster,  but 
there  are  other  characteristics  (accuracy,  lethality  given  a 
hit,  etc.)  that  are  not  cited.  Hence,  the  question  Is  loaded 
because  It  only  presents  some  of  the  data  relevant  to  comparing 
the  rifles. 

Items  2  and  3  In  Figure  YI-C-3  show  loading  of  a  different 
type.  In  Item  2,  analysis  of  the  available  alternatives  leaves 
^  Impression  that  the  writer  of  the  question  tiilnks  at  least 
some  should  not  have  a  full  automatic  selector.  Analysis  of 
the  alternatives  In  Item  3  leads  to  the  suspicion  that  the 
writer  of  the  question  believes  there  should  be  at  least  one 
grenade  launcher  In  the  rifle  squad,  since  a  response  atterna- 
tlve  of  zero  grenade  launchers  was  not  provided. 

There  are  many  additional  ways  that  questions  can  be  loaded. 

One  way  Is  to  provide  the  respondent  with  a  reason  for  select¬ 
ing  one  of  the  alternatives,  as  with  the  question,  "Should  we 
Increase  taxes  In  order  to  get  better  schools,  or  should  we 
keep  them  about  the  same?"  A  question  can  also  be  loaded  by 
referring  to  some  prestigious  Individual  or  group,  as  In,  "A 
group  of  experts  has  suggested... Oo  you  approve  of  this,  or  do 
you  disapprove?" 


80 


VI-C  Page  7 
1  Jul  76 


Figure  YI-C-3 

Examples  of  Loaded  Questions 

1.  Which  rifle  do  you  prefer,  the  lighter,  faster  shooting  M16  or 
the  heavier,  slower  firing  H14? 


2.  Should  every  rifleman  in  the  rifle  squad  have  a  full  automatic 
selector  on  his  rifle? 


If  no,  how  many  should? 

3.  How  many  grenade  launchers  (M79)  do  you  desire  in  the  rifle 
squad? 


4  or  more 


Leading  questions  are  similar  to  loaded  questions.  Two  exam¬ 
ples  are  shown  in  Figure  VI-C-4.  The  problem  is  that  most 
people  are  reasonably  cooperative  and  like  to  help.  If  they 
can  figure  out  what  is  wanted,  they  will  often  try  to  comply. 
The  items  in  Figure  VI-C-4  were  actually  used  in  the  collection 
of  data  in  a  field  test.  As  might  be  expected,  the  impression 
received  from  an  analysis  of  the  results  is  that  men  are,  in 
general,  highly  motivated,  and  use  good  noise  discipline  during 
movement.  (These  items  also  allow  respondents  to  avoid  criti¬ 
cizing,  and  to  give  socially  desirable  answers.) 


VI-C  Page  8 
8  Mar  85 
(s.  1  Jul  76) 


Figure  VI-C-4 

Exanples  of  Leading  Questions 

1.  Do  you  think  your  men  were  pretty  highly  motivated  on  this 
exercise? 

Yes _ 

Ho _ 

2.  Were  they  pretty  good  at  using  good  noise  discipline  during 
movement? 

Yes _ 

No 


The  best  way  to  avoid  loaded  questions  Is  to  find  a  devil's 
advocate  to  review  them  or  tc  pretest  the  Items  on  someone  who 
holds  opposite  or  minority  views.  Another  check  Is  to  ask 
yourself  what  you  think,  what  someone  who  disagrees  with  you 
would  think,  And  whether  your  response  alternatives  would  give 
the  respondents  a  chance  to  present  their  views. 

Hot  every  change  In  wording  will  have  a  significant  effect  on 
the  Item.  This  provides  a  measure  of  latitude  In  the  design  of 
the  Items.  81atant  attempts  to  bias  an  Item  by  tone  of  wording 
are  not  so  likely  to  succeed.  Research  Indicates  that  blatant 
language  may  have  no  effect  on  responses.  There  Is  no  con¬ 
vincing  evidence  that  respondents  with  strong  attitudes  toward 
a  topic  would  be  less  Influenced  by  the  tone  of  wording  than 
respondents  who  did  not  have  c  strong  attitude  toward  the 
topic. 

There  are  times  when  loaded  questions  probably  should  be  used. 
This  Is  when,  without  loading,  the  question  would  pose  an 
ego-threat  to  the  respondents,  so  that  they  might  give  an 
untruthful  reply.  The  loading  removes  the  ego-threat  so  that  a 
more  valid  response  can  be  obtained.  An  example  might  be, 

"Many  peopis  are  not  able  to  get  as  much  schooling  as  they 
would  like.  What  was  the  last  grade  you  completed  In  school?" 

g.  Embarrassing  or  self-incriminating  questions.  Respondents 
should  not  be  asked  embarrassing  or  self-incriminating  ques¬ 
tions.  Consider  the  question,  "Did  you  clean  your  weapon 
regularly  In  Vietnam?"  It  Is  asking  respondents  who  did  not 
clean  their  rifles  regularly  to  expose  themselves  to  possible 
embarrassment.  Thus,  one  would  expect  the  percentage  of  "No" 
responses  to  fall  short  of  the  true  percentage  not  cleaning 
their  weapons  "regularly." 


82 


VI-C  Page  9 
8  Mar  85 
(s.  1  Jul  76) 

Occasionally  questionnaires  cover  topic  areas  that  are  sensi¬ 
tive,  and  may  be  perceived  as  threatening  by  the  respondents. 
For  this  type  of  questionnaire,  threatening  questions  elicit 
greater  under-reporting  when  closed-end  questions  are  used. 
Thus,  open-ended  questions  are  appropriate  for  threatening 
topics.  Longer  questions,  using  the  language  of  the  respon¬ 
dent,  seems  to  decrease  unwanted  response  effects  for  threaten¬ 
ing  questions.  In  addition,  willingness  to  answer  threatening 
questions  Is  Increased  by  assuring  respondents  that  their  an¬ 
swers  will  be  treated  confidentially.  An  example  of  a  threat¬ 
ening  question  Is  presented  In  Figure  VI-C-5  which  Illustrates 
a  longer,  open-ended  Item  used  with  threatening  content. 


Figure  VI-C-5 

Example  of  a  Threatening  Question 

Please  describe  and  explain  In  your  own  words  any  problem  In  your 
unit  that  might  be  causeJ  oy  the  use  of  too  much  alcohol,  mari¬ 
juana,  or  hard  drugs  by  upper-ranking  officers,  senior  NCOs,  or 
supervisors. 


Questions  that  ask  respondents  to  go  against  basic  Inclina¬ 
tions. 


Many  people  are  reluctant  to  criticize,  though  they  enjoy 
giving  praise.  Thus,  a  question  that  allows  respondents  to 
avoid  criticism  will  bias  their  answers;  similarly,  a  question 
that  offers  them  the  opportunity  to  criticize  may  bias  re¬ 
sponses  because  they  will  not  wish  to  do  so.  Figure  VI-C-6 
Illustrates  this. 


33 


VI-C  Page  10 
8  Har  85 
(s.  1  Jul  76) 


Figure  YI-C-6 

Example  of  a  Question 
Asking  the  Respondent  to  Criticize 

1.  Has  your  unit's  use  of  fire  and  maneuver  correct,  and  In  accor 
dance  with  current  Anqy  doctrine? 


If  no,  why  not? 


The  question  In  Figure  YI>C-6  asks  the  respondents  either  to 
c»'1t1c1ze  their  unit  or  to  avoid  criticism.  Some  respondents 
might  answer  “No"  If  they  have  an  Important  point  to  make. 
However,  a  substantial  number  of  others  will  wash  their  hands 
of  the  whole  affair  and  answer  "Yes,"  although  they  might  feel 
that  performance  was  not  completely  correct. 

1 .  Inclusion  of  different  subjects  Into  the  same  question.  Com- 
pound  questions  should  be  avoided.  These  are  questions  that 
require  a  respondent  to  give  the  same  assessment  of  two  or  more 
Issues/characteristics  or  aspects  of  the  subject.  Respondents 
must  be  allowed  to  make  separate  assessments  of  each  Issue. 
Consider,  for  example.  Item  I  In  Figure  YI-C-7.  Most  respon¬ 
dents  would  probably  want  to  rate  completeness  and  accuracy 
dlfferent'y,  since  In  most  situations  research  has  shown  that 
they  are  nenat1’'aly  correlated.  Therefore,  the  two  aspects  of 
performaice  should  be  rated  separately,  as  shown  In  Items  2  and 
3  of  Figure  YI-C-7. 


Vl-C  Page  11 

8  Mar  85 
(s.  1  Jul  76) 

Figure  VI-C-7 
Examples  of  Compound  Questions  and  Alternatives 

1.  How  complete  and  accurate  was  the  surveillance  Information? 

_ Verj^  satisfactory 

_ Satisfactory 

_ _ Borderline 

______  Unsatisfactory 

Very  unsatisfactory 

2.  How  complete  or  Incomplete  was  the  surveillance  Information? 

_  Very  complete 

_  Fairly  complete 

Borderline 

______  Fairly  incomplete 

______  Very  incomplete 

3.  How  accurate  or  Inaccurate  was  the  surveillance  information? 

______  Very  accurate 

_____  Fairly  accurate 
Borderline 

_ Fairly  Inaccurate 

_  Very  Inaccurate 

It  may  be  noted  that  In  Item  2  of  Figure  VI-C-7  both  "complete" 
and  "incomplete"  are  Included.  Similarly,  both  "accurate"  and 
"Inaccurate"  are  in  the  stem  of  Item  3.  To  use  only  one  (e.g., 
"complete")  In  the  stem  would  tend  to  Inflate  the  number  of 
respondents  selecting  that  alternative. 


85 


VI-C  Page  12 
8  Har  85 
(s.  1  Jul  76) 


j.  Use  of  giveaway  words.  Avoid  words  iriilch  lead  the  careful 
thinker  to  respond  in  the  negative,  while  others,  thinking  less 
carefully,  respond  in  the  positive.  Consider  for  example  the 
question,  "Do  you  feel  that  your  unit  did  its  best  in  all 
contacts  over  the  past  six  months?"  One  wonders  if  any  unit 
can  do  its  actual  best,  except  very  rarely.  The  word  "all" 
makes  this  an  even  more  difficult  question  to  answer  positive¬ 
ly. 

k.  Ambiguous  questions.  Vague  or  ambiguous  words  or  questions 
should  be  avoided.  For  example,  the  (^stion  "Hhat  is  your 
income?"  is  not  sufficiently  specific.  The  respondents  may 
give  monthly  or  annual  Income,  Income  before  or  after  taxes, 
their  income  or  the  family  income,  etc. 

As  another  example,  consider  item  1  in  Figure  VI-C-8. 


Figure  VI-C-8 

Example  of  Ambiguous  Question  and  Alternative 

1.  Did  you  clean  your  rifle  regularly  in  Vietnam? 

_ Yes 

_ No 

2.  How  often,  on  the  average,  did  you  clean  your  rifle  in  Vietnam? 

■  Ever'y  day  _____  Once  every  three  days 

_ _  OncJ  every  two  days  _ Once  every  four  days 

Other  (please  specify):  _ 


Use  of  the  word  “regularly"  without  specification  of  the  time 
interval  between  cleanings  is  a  defect  in  the  question.  A 
respondent  could  justify  a  "yes"  by  thinking  to  hiraself/her- 
self:  "Sure,  I  cleaned  it  regularly  -  once  every  four  months!" 
Because  of  the  self-exposure  involved,  the  questionnaire  item 
approach  to  this  topic  is  probably  not  capable  of  providing  an 
accurate  estimate,  but  rewording  could  still  make  the  amount  of 
underestimation  less.  So,  if  the  data  cannot  be  collected  by 
field  Inspection,  the  revised  questionnaire  item  could  read 
like  item  2  in  Figure  VI-C-8. 


86 


VI-C  Page  13 
8  Mar  85 


Items  are  sometimes  loaded  because  the  wording  Is  ambiguous, 
coerces  agreement,  or  uses  Jargon  or  technical  words  that  are 
not  understandable.  Review  of  Items  for  Illogical  response 
patterns  may  be  useful  when  respondents  have  less  education. 
Figure  VI-C-9  Illustrates  items  which  were  highly  ambiguous  In 
their  wording.  Some  respondents  did  not  consider  the  first 
Item  In  a  literal  sense  for  Its  Impact  on  subsequent  Items. 
This  set  of  Items  obtained  many  Illogical  response  patterns. 


Figure  VI-C-9 

Example  of  Ambiguity  of  Wording 

"Are  there  any  situations  you  can  Imagine 
in  which  you  would  ap. rove  of  a  policeman 
striking  an  adult  male  citizen?" 

YES,  MO, 

MOT  SURE 

"Would  you  approve  If  the  citizen  .  .  ." 

A.  "had  said  vulgar  and  obscene  things  to 
a  policeman?" 

YES.  MO. 

NOT  SURE 

8.  "was  being  questioned  as  a  suspect  In 
a  murder  case?" 

YES,  MO, 

NOT  SURE 

C.  "was  attempting  to  escape  from 
custody?" 

YES,  MO, 

NOT  SURE 

0.  "was  attacking  the  policeman  with  his 
fists?" 

YES.  MO, 

NOT  SURE 

When  Items  are  long  and  negatively  worded,  they  may  create 
ambiguity.  This  ambiguity  seems  to  result  In  an  Increased 
number  of  responses  In  the  middle  alternative.  Pretesting  the 
Items  would  provide  the  opportunity  for  modification  of  Items 
by  obtaining  feedback  from  the  respondents  on  Issues  related  to 
the  complex  meaning  of  any  technical  words,  and  any  multiple 
meanings  of  words.  Use  the  language  of  the  respondents  In 
developing  and  refining  Items. 

2.  Formulation  of  the  Response  Alternatives 

When  formulating  the  response  alternatives  portion  of  a  question¬ 
naire  Item,  the  following  points  should  be  kept  In  mind: 

a.  All  response  alternatives  should  follow  the  stem  both  gram¬ 
matically  and  logically,  and.  If  possible,  be  parallel  In 
structure. 


VI-C  Page  14 
8  Mar  85 
(s.  1  Jul  76) 

b.  If  It  Is  not  known  whether  or  not  all  respondents  have  the 
background  or  experience  necessary  to  answer  an  Item  Ur  If  It 
Is  known  that  some  do  not),  a  "Don't  know*  response  alternative 
should  be  Included. 

c.  When  preference  questions  are  being  asked  (such  as  "Which  do 
you  prefer,  the  M16  or  the  M14  rifle?"),  the  "No  preference" 
response  alternative  should  usually  be  Included.  The  Identi¬ 
fication  of  "No  preference"  responses  permits  computation  of 
whether  or  not  an  actual  majority  of  the  total  samples  are  pro 
or  con. 

d.  Respondents  with  a  low  educational  level  have  a  propensity  to 
use  the  "Don't  know"  response  alternative. 

e.  When  the  "Don't  know"  response  alternative  Is  used.  It  should 
be  set  apart  from  other  responses  to  avoid  confusing  It  with 
the  endpoint  or  the  midpoint  of  the  rating  scale. 

f.  Content  Items  can  be  developed  which  will  Indicate  whether  a 
subject  has  knowledge  regarding  the  topic  In  question.  If  the 
subject  has  little  topic  knowledge,  and  there  Is  not  a  "Don't 
know"  category,  there  Is  the  potential  for  greater  rating 
error. 

g.  The  use  of  the  "None  of  the  above*  option  or  variants  of  It, 
such  as  "Not  enough  Information,"  Is  sometimes  useful. 

h.  The  option  "All  of  the  above"  may  on  rare  occasions  be  useful. 
It  seems  more  appropriate  to  academic  test  questions  than  to 
the  questioning  of  field  test  participants. 

I.  For  most  Items,  the  questionnaire  writer  desires  the  respondent 
to  check  only  one  response  alternative.  Use  of  the  parenthetic 
"(Check  one,)"  should  eliminate  the  selection  of  more  than  one 
alternative.  It  Is  very  Important  to  make  It  clear  to  the 
respondents  that  they  may  check  more  than  one  alternative  In 
those  fairly  rare  Instances  where  the  questionnaire  writer  does 
wish  to  permit  this. 

J.  In  some  Instances,  response  categories  as  long  as  a  sentence 
may  be  more  desirable  than  short  descriptors.  In  rare  cases, 
numbers  may  be  used  without  verbal  descriptors.  If  the  nunbers 
have  been  previously  defined.  It  does  not  seem  to  matter  if 
the  response  alternatives  are  numerical,  verbal  (one  word),  or 
phrases.  No  one  type  of  response  alternative  has  proven  su¬ 
perior  to  another. 

k.  When  the  quality  of  the  Item  Is  high  and  the  data  Is  available, 
response  alternatives  can  be  selected  which  have  standard 
deviations  less  than  1.00  (see  Section  YIII-E). 


88 


VI-C  Page  15 
8  Har  85 
(s.  1  Jul  76} 


l.  There  Is  some  evidence  that  responses  to  scales  labeled  at  only 
the  extreme  ends  have  been  skewed  toward  the  positive  end  of 
the  scale.  Fully  labeled  scale  points  may  encourage  a  more 
balanced  response  distribution. 

m.  Number  of  response  alternatives  Is  discussed  In  Section  VI*G, 
order  of  response  alternatives  In  Section  VI-H,  response  an¬ 
choring  In  Chapter  VII.  and  the  order  of  perceived  favorable¬ 
ness  of  commonly  used  words  and  phrases  In  Chapter  VI I I. 

3.  Expressing  Directionality  and  Intensity  In  Stem  Versus  Response 
Alternatives  “  ” 

In  Item  1  of  Figure  VI-C-10.  directionality  (In  this  case,  satis¬ 
faction)  Is  expressed  In  the  question  stem. 

Figure  VI-C-10 

Alternate  Ways  of  Expressing  Directionality  and  Intensity 

1.  The  M16  Isa  satisfactory  rifle. 

— —  Agree 

D1 sagree 

2.  The  M16  Is 

a  satisfactory  rifle, 
an  unsatisfactory  rifle. 

3.  The  behavior  of  civilian  employees  of  the  PX  toward  enlisted  per¬ 
sonnel  Is  extremely  offensive. 

Agree 

Disagree 

4.  The  behavior  of  civilian  employees  of  the  PX  toward  enlisted  per¬ 
sonnel  Is 

______  very  of fensi  ve. 

_____  somewhat  offensive. 

_____  neutral . 

_____  somewhat  pleasant. 

very  pleasant. 


VI-C  Page  16 
8  Mar  85 
(s.  1  Jul  76} 

In  Item  2,  the  directionality  Is  expressed  In  the  response  alter¬ 
natives.  In  Item  3.  the  stem  contains  terms  of  Intensity  and 
directionality,  while  these  terms  are  located  In  the  response 
alternatives  In  Item  4.  Item  2  Is  preferred  to  Item  1,  and  Item  4 
Is  strongly  preferred  to  the  Item  3  approach.  The  rationale  for 
this  preference  Is  similar  to  the  discussion  of  positive  versus 
negative  terms.  Those  who  chectc  "Disagree”  to  Item  3  have  not  been 
permitted  to  Indicate  what  It  Is  they  would  agree  with,  (e.g., 
those  who  feel  employees  are  offensive  but  not  extremely  offensive 
would  have  to  check  "Disagree,"  as  would  those  who  feel  employees 
are  very  pleasant),  whereas  the  construction  of  Item  4  does  permit 
them  to  do  so.  It  would  take  five  versions  of  Item  3  to  correct 
this  deficiency  and  achieve  the  coverage  of  opinion  Incorporated  by 
the  response  alternatives  of  Item  4. 


90 


VI-0  Page  1 
1  Jul  76 


0.  Difficulty  of  Items 

I.  One  of  the  major  recoounendatlons  advanced  by  almost  every  general 
source  on  how  to  write  sound  questionnaires  Is  "keep  It  simple.” 
Logic  dictates  that  words  used  In  surveys  should  not  have  multiple 
meanings,  nor  should  they  be  beyond  the  level  of  vocabulary  of  the 
typical  respondent.  Words,  phrases,  and  sentence  structures  that 
the  respondent  can  understand  should  be  used. 

Consider  Item  1  In  Figure  VI-O-l.  It  contains  too  many  hard  to 
understand  words.  Many  respondents  would  have  difficulty  under¬ 
standing  either  the  question  or  the  response  alternatives.  In  the 
revision  In  Item  2,  tte  words  have  been  simplified  and  a  "catch¬ 
all"  open-ended  response  alternative  added  (to  catch  all  other 
reasons). 


Figure  YI-O-1 

Example  of  Hard  to  Understand  Item  and  Alternative 

1.  In  the  highly  specialized  counterinsurgency  environment  repre¬ 
sented  by  the  basically  Internecine  affair  In  Vietnam,  what 
would  you  say  should  represent  the  basic  essence  of  our  ration 
ale  for  continuation  of  our  Involveoent? 

Prolongation  of  attrition  of  enemy  forces.  In  order  to 
reduce  the  level  of  threat  to  South  Vietnam. 

Orderly  transfer  of  military  responsibility  to  the  host 
country.  In  order  to  produce  stabilized  competency  tu 
deal  with  any  future  Internal  distrubances. 

2.  What  Is  our  main  reason  for  staying  In  Vietnam?  (Check  one) 

To  reduce  the  threat  to  South  Vietnam  by  continuing  the 
destruction  of  enemy  forces. 

To  assure  South  Vietnam's  survival  while  It  takes  over 
responsibility  for  Its  own  protection. 

_ Other  (specify)  • 


91 


VI-D  Page  2 
8  Mar  85 
(s.  1  Jul  76) 

It  should  not  be  assumed  that  the  respondent  will  understand  what 
the  question  writer  Is  talking  about.  Consider,  for  example,  the 
question  "Which  do  you  prefer,  dichotomous  or  open  questions?"  The 
odds  are  that  a  fairly  substantial  number  of  people  would  not  be 
able  to  define  these  two  question  types.  However,  If  they  are 
asked  this  question,  they  will  be  happy  to  choose.  The  point  Is 
that  people  will  net  volunteer  their  Ignorance  of  something,  al¬ 
though  they  may  admit  It  If  you  ask  them.  However,  this  caution 
goes  beyond  Ignorance  of  an  Issue.  Another  ^oblem  Is  that  the 
specialists  wording  the  question  may  simply  have  an  unusual  coomand 
of  their  own  language.  Scientific  Jargon  has  been  criticized. 
Perhaps  overlooked  Is  the  fact  that  there  are  other  kinds  of  Jar¬ 
gon,  too.  The  question  askers  have  a  responsibility  to  make  them¬ 
selves  understood.  One  way  of  screening  for  Individuals  who  do  not 
have  a  basis  for  providing  the  Information  needed  Is  to  Include  one 
or  two  pure  Information  questions.  Plan  to  discard  questionnaire 
returns  from  respondents  who  cannot  answer  the  Information  ques¬ 
tions  correctly.  However,  our  usual  policy  should  be  to  throw  out 
or  revise  Items  that  are  not  understandable,  rather  than  to  throw 
out  the  responses  of  the  people  who  can't  understand  the  Item. 

Schaefer,  Bavelas,  and  Bavelas  (1980)  developed  a  method  to  ensure 
that  respondents  would  only  be  subjected  to  Items  that  they  could 
understand.  The  technique  that  they  used  Is  called  "Echo."  They 
developed  Items  that  were  used  In  a  performance  rating  scale.  It 
would  be  possible  to  use  the  "Echo"  technique  In  the  developmnt  of 
survey  Items,  too.  Essentially,  the  "Echo  technique  Is  a  method 
for  wording  questionnaire  Items  In  the  language  of  the  respondents. 
A  detailed  procedure  for  using  the  "Echo"  technique  Is  available 
from  J.  8.  Bavelas  (1980). 

The  "Echo"  Technique  assunes  that  there  are  two  separate  popula¬ 
tions  In  the  development  of  questionnaire  Items.  One  population  Is 
the  researchers,  and  the  other  population  Is  the  respondents. 
Phrasing  of  Items  needs  to  be  In  the  language  of  the  respondents.  . 
It  requires  content  validation,  I.e.,  confirmation  that  the  content 
Is  understandable  to  the  respondents.  The  "Echo"  technique  In¬ 
cludes  the  development  of  a  pool  of  Items  generated  by  a  survey 
directed  to  the  target  population.  The  sanqtle  of  potential  respon¬ 
dents  from  the  target  population  follows  printed  guidelines  to 
write  the  Items.  Another  sample  from  the  target  population  Is 
selected  to  sort  Items  Into  categories.  Part  of  this  process 
Includes  concurrence  by  the  members  of  the  sample  that  the  cate¬ 
gories  are  mutually  exclusive. 


92 


2.  Ways  of  Measuring  Item  Difficulty 


VI-0  Page  3 
8  Mar  85 
(s.  1  Jul  76) 


Various  procedures  exist  for  determining  the  difficulty  or  reading 

comprehension  level  of  printed  material.  Such  a  discussion  is, 

however,  beyond  the  scope  of  this  manual.  Sources  that  may  be 

consulted  include: 

a.  Bavelas,  J.  8.  (1980).  In-house  report  for  professionals  and 
nonprofessional  —  procedural  details  for  the  "Echo"  technique. 
Victer^a,  British  Columbia:  liniversity  of  Victoria,  Department 
of  Psychology. 

b.  Dale,  E.,  i  Chall,  <1.  S.  (1948).  A  formula  for  predicting 
readability.  Educational  Research  Bulletin.  27,  11-20,  37-54. 

c.  Flesch,  R.  (1948).  A  new  readability  yardstick.  Journal  of 
Applied  Psychology.  221-233. 

d.  Fry,  E.  (1968).  A  readability  formula  that  saves  time.  Jour- 
nal  of  Reading.  11,  513-516. 

e.  Lorge,  I.  (1944).  Predicting  readability.  Teachers  Colleqe 
Record.  45,  404-419. 

f.  ichaefer,  8.  A.,  Cavelas,  J.,  i  Bavelas,  A.  (1980).  Using  echo 
technique  to  construct  student-generated  faculty  evaluation 
questionnaires.  Teaching  of  Psychology.  1^(2),  83-86. 

g.  Thorndike,'^^  L.,  A  Lorge,  R,  (1944).  The  teacher's  word  book 
of  30,000  words.  New  York:  Columbia  University  Press. 


93 


YI-E  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


E.  length  of  Question/Stem 

This  section  notes  sone  considerations  about  the  length  of  question 

stems.  There  Is  little  research  In  this  area  to  guide  the  question¬ 
naire  writer.  See  Section  IX-C  regarding  questionnaire  length. 

1.  It  Is  sometimes  desirable  to  break  the  question  stem  Into  two  or 
more  sentences  when  the  sentence  structure  would  otherwise  be 
unnecessarily  complex.  For  Instance,  one  sentence  can  state  the 
situation,  and  one  can  pose  the  question.  Lengthy  question  stems 
that  try  to  explain  a  complicated  situation  to  the  respondent 
should  be  avoided.  If  the  respondents  are  not  aware  of  the  facts 
presented,  they  may  become  more  confused  or  biased  than  enlight¬ 
ened,  and  their  opinion  would  not  mean  much. 

2.  Longer  open-ended  questions  do  not  necessarily  produce  a  greater 
amount  of  and  mere  accurate  Information  than  shorter  ones.  How¬ 
ever,  It  may  take  more  words  to  achieve  a  proper  focus. 

3.  Questionnaire  developers  have  a  tendency  to  use  long  question  stems 
with  true-false  questions  when  "True"  Is  the  correct  answer. 
Respondents  often  detect  and  react  to  this  tendency.  Field  test 
questionnaires,  however,  should  make  relatively  little  use  of 
''True"  and  "False"  response  alternatives.  These  alternatives  are 
more  appropriately  used  when  testing  whether  respondents  have 
acquired  a  required  proficiency  level,  for  example,  the  ability  to 
visually  recognize  a  given  type  of  enemy  aircraft. 

4.  To  obtain  higher  reporting  levels  by  respondents  when  threatening 
questions  are  asked  about  the'r  behavior,  longer  Items  may  be  best. 
Items  with  30  or  mere  words  have  achieved  best  results.  Items  with 
fewer  words  (less  than  30)  have  not  elicited  reporting  levels  which 
were  as  high.  One  of  the  longer  Items  had  49  words,  and  the  con¬ 
tent  was  about  the  use  of  drugs. 


S4 


VI-F  Page  I 
8  Mar  85 
(s.  1  Jul  76} 


F.  Order  of  Question  Steins 

There  are  two  Issues  to  consider  regarding  the  order  of  question  steins. 
The  first  has  to  do  with  the  order  of  questions  within  a  series  of 
Items  that  are  designed  to  explore  the  same  topic  or  subject  matter  or 
related  subject  matter  areas.  The  second  has  to  do  with  the  order  of 
different  groups  of  questions  when  the  groups  deal  with  fairly  separate 
topics  or  subject  matter  areas.  For  example,  one  group  of  questions 
may  deal  with  factual  Items,  while  another  may  deal  with  attitudes.  If 
Items  bearing  on  the  same  point  are  presented  In  succession,  the  re¬ 
spondent  can  proceed  more  readily  through  them.  Thus,  this  Is  usually 
a  desirable  practice.  An  exception  arises  when  one  wishes  to  check  the 
consistency  of  the  respondents.  To  do  this,  two  (or  acre)  similar 
Items  are  Included,  but  at  widely  different  points  In  *he  question¬ 
naire. 

1,  Order  of  Questions  Within  a  Series  *^f  Items 

a.  It  Is  often  recommended  that  the  order  of  questions  on  an 
Instrunent  be  varied  or  assigned  randomly  to  avoid  one  question 
contaminating  another.  The  view  Is  that  the  Immediately  pre¬ 
ceding  question  or  group  of  questions  places  the  respondent  In 
a  “mental  set“  or  frame  of  reference.  For  example,  asking 
respondents  a  general  question  about  their  feelings  regarding 
automobile  exhaust  pollution  might  Influence  responses  to  the 
question,  “Do  you  prefer  leaded  or  nonleaded  gasoline?"  Ques¬ 
tionnaires  are  plagued  by  contextual  effects  attributed  to  Item 
ordering.  Respondents  lacking  In  experience  of  the  content 
area  may  change  their  responses  as  they  progress  through  the 
questionnaire,  since  they  may  learn  from  previous  Items  (order 
effects).  This  may  damage  the  face  validity  of  the  responses 
to  the  Initial  Items.  Yet,  the  meaning  of  the  Items  would  be 
changed  If  they  were  separated  from  their  topic  areas.  The 
current  state-of-the-art  for  context  effects  suggests  that  all 
Items  which  are  Interrelated  by  content  area  may  be  affected  by 
context  effects.  There  Is  currently  no  way  to  predict  which 
Items  will  have  context  effects. 

b.  Sometimes  It  is  recommended  that  broad  questions  be  asked  be¬ 
fore  specific  questions.  The  rationale  for  this  approach  Is 
that  the  respondent  can  more  easily  and  validly  answer  specific 
questions  after  having  had  a  chance  to  consider  the  broader 
context.  Also,  asking  the  specific  questions  first  could 
Influence  the  response  to  the  broader  question.  The  quality  of 
responses  to  questions  on  a  questionnaire  will  be  determined  by 
the  respondent's  background  and  knowledge  of  the  topic  area.  A 
series  of  specific  questions  (versus  general  questions)  will 
provide  Information  about  whether  the  respondent  understands 
the  content  of  the  questions.  It  should  expose  any  logical 
Inconsistencies  In  response  patterns.  Respondents  with  limited 
or  no  experience  regarding  the  content  area  may  deviate  from 
the  logical  response  pattern.  Their  answers  to  questions  may 
change  as  they  become  more  fMlIlar  with  the  topic  through 
order  effects.  Early  responses  may  not  have  face  validity. 

ys 


General  and  specific  questions  were  empirically  examined  for 
order  effects.  The  order  of  the  questions  did  not  appear  fi 
effect  the  way  respondents  marked  the  response  alternatives. 

It  Is  proposed  that  a  stronger  survey  Instrument  may  be  pro¬ 
vided  by  assigning  general  Items  first,  followed  by  specific 
Items  on  related  topic  areas.  However,  questions  which  are 
specific  are  preferred  over  general  type  questions.  Contextual 
effects  can  be  minimized  by  developing  questions  which  are 
specific  In  content.  Minimize  the  number  of  general  questions. 

c.  The  order  of  questions  within  a  series  of  items  will  also 
depend  upon  whether  filter  questions  are  needed.  A  filter 
question  Is  used  to  exclude  respondents  from  a  particular 
sequence  of  questions  If  those  questions  are  Irrelevant  to 
them.  For  example.  If  a  series  of  Items  were  asked  about 
different  kinds  of  weapons,  a  "No*  response  to  a  question  such 
as  "Have  you  ever  used  the  M14  rifle?"  might  be  used  to  Indi¬ 
cate  that  the  respondent  should  skip  the  following  questlonCs) 
about  the  M14. 

When  filter  questions  are  used  by  an  Interviewer,  they  can 
reduce  Interviewing  time.  Clear  branching  Instructions  are 
Imperative  for  the  Interviewer.  Filter  questions  used  to 
branch  In  mall  surveys  or  group-administered  questionnaires 
have  the  potential  to  Increase  non-response  rate  for  questions 
which  follow  a  branch.  Items  following  a  branch  tend  to  re¬ 
ceive  a  lower  response  rate.  Response  rate  for  Individuals 
over  60  years  of  age  are  even  lower.  There  are  alternatives  to 
branching  such  as  the  design  of  different  questionnaires  for 
different  categor'ies  of  respondents.  The  design  of  different 
questionnaires  for  different  groups  of  respondents  Is  Illus¬ 
trated  In  Figure  YI-F-1. 


Figure  YI-F-1 

Example  of  Bradley  Fighting  Yehicle  Questionnaire 
for  Multiple  Groups 


Questionnaires  Designed  for: 

1.  Driver 

2.  Track  conmander* 

3.  Gunner 

4.  other  personnel 


VI-F  Page  3 
8  Mar  85 
ts.  1  Jul  76) 

2.  Order  of  Different  Groups  of  Questions' 

a.  There  Is  usually  a  psychological  or  logical  order  in  which  to 
ask  questions,  so  that  the  questionnaire  flows  smoothly  from 
one  topic  to  the  next  and  the  respondent  is  not  shifted  fre¬ 
quently  from  one  topic  to  another  and  back  again.  However, 
when  a  shift  is  made  from  one  topic  to  another,  it  should  be 
apparent  to  the  respondent. 

b.  It  is  usually  reccmnended  that  more  difficult  or  more  sensitive 
questions  be  asked  later  in  the  questionnaire,  possibly  at  the 
end. 

c.  One  or  more  easy,  non-threatening  questions  should  probably  be 
asked  first  to  build  rapport.  They  should  be  short  and  easy  to 
understand  and  to  answer.  But  they  should  not  be  irrelevant  to 
the  objectives  of  the  questionnaire.  Verbal  efforts  to  build 
rapport  by  the  questionnaire  administrator  seem  preferable  to 
using  questionnaire  content  to  accomplish  this  task. 


S7 


VI-G  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


G.  Number  of  Response  Alternatives 

The  following  sections  consider  number  of  response  alternatives  to  use 
In  multiple  choice,  rating  scale,  and  forced  choice  Items:  Section 
VI-C-3  -  formulation  of  response  alternatives;  Section  YI-H  -  order  of 
response  alternatives;  Chapter  VII  -  response  anchoring;  Chapter  VIII  ** 
order  of  perceived  favorableness  of  words  and  phrases. 

One  of  the  basic  Issues  In  the  use  of  rating  questions  or  attitude 
scales  Is  the  determination  of  the  optimum  nuaber  of  responses,  alter¬ 
natives  or  categories.  In  questionnaire  construction,  researchers  have 
Investigated  the  utility  of  having  a  scale  with  a  greater  or  smaller 
number  of  scale  points.  Over  the  years,  there  have  been  diverse  recom¬ 
mendations  on  the  proper  number  of  scale  points  or  categories  to  use  In 
questionnaire  construction.  Investigations  have  Indicated  that  relia¬ 
bility  was  optimum  for  scale  points  of  2,  S,  10,  11,  20,  and  25.  Some 
recent  research  has  proposed  the  use  of  a  range  of  scale  points  between 
2  and  10.  The  reason  for  concern  with  the  nuaber  of  response  alterna¬ 
tives  Is  due  to  the  belief  that  a  "coarse*  scale  with  too  few  response 
alternatives  may  result  In  a  loss  of  Information  concerning  the  re¬ 
spondents'  discrimination  powers.  It  may  reduce  the  respondents' 
cooperation  In  rating,  as  a  coarse  scale  "forces*  Judgments  and  thereby 
Irritates  some  respondents.  An  extremely  "fine*  scale,  with  too  many 
response  alternatives,  may  go  beyond  the  respondents'  powers  of  dis¬ 
crimination,  be  excessively  time  consunlng,  or  difficult  to  score. 

1.  Nunber  of  Response  Alternatives  with  Hultlple  Choice  Items 

No  firm  rules  can  be  established  regarding  the  number  of  response 
alternatives  to  use  with  multiple  choice  Items.  It  depends  In  a 
large  part  upon  the  question  being  asked,  and  the  number  of  answers 
logically  possible.  The  following  considerations,  however,  may  be 
noted: 

a.  There  Is  some  evidence  that  dichotomous  Items  (Items  with  only 
two  response  alternatives)  are  statistically  Inferior  to  Items 
with  more  than  two  response  alternatives. 

b.  Dichotomous  Items  are  easier  to  score  than  nondlchotomous 
Items,  but  they  may  not  be  accepted  as  well  by  the  respondent. 

c.  A  good  nondlchotomous  multiple  choice  Item  usually  cannot  be 
written  as  a  set  of  separate  dichotomous  Items. 

•  d.  Consideration  should  be  given  to  the  prospect  that  many  re¬ 

sponse  alternatives  may  make  a  questionnaire  unduly  time  con¬ 
suming. 

e.  The  nunber  of  choices  logically  possible  or  desirable  should 
constitute  an  upper  limit  on  the  number  of  response  alterna¬ 
tives  used  for  an  Item. 


93 


VI-6  Page  2 
8  Mar  85 
(s.  1  Jul  76) 


f.  Non-existent  response  alternatives  may  be  checked  by  the  re¬ 
spondent  If  an  answer  sheet  Is  used  which  has  more  spaces  than 
there  are  alternative  answers;  e.g.,  the  answer  sheet  has  five 
spaces  for  each  question,  but  some  questions  have  fewer  than 
five  alternatives. 

2.  Number  of  Response  Alternatives  with  Rating  Scale  Items 

Authorities  In  psychometrics  contend  that  the  optimal  number  of 
response  alternatives  to  employ  with  rating  scales  Is  a  matter  for 
empirical  determination  In  any  situation.  They  also  suggest  that 
considerable  variation  In  number  around  the  optimal  number  changes 
reliability  very  V'Ule.  These  conclusions  seem  to  be  supported  by 
the  available  research  literature.  Although  rules  regarding  the 
number  of  response  alternatives  to  use  with  rating  scales  cannot, 
therefore,  be  firmly  established,  the  following  Issues  can  be 
considered. 

a.  The  effects  of  Increasing  or  decreasing  the  number  of  response 
alternatives  for  a  question  cannot  be  generally  specified  with 
certainty.  Increasing  the  number  of  response  alternatives  does 
not  necessarily  Increase  reliability,  and  there  Is  no  consis¬ 
tent  relationship  between  the  nunber  of  response  alternatives 
and  validity. 

b.  J.  P.  Guilford  (In  Psychometric  methods.  New  York:  McGraw- 
Hill,  1954)  reported  that  seven  response  alternatives  Is  usual¬ 
ly  lower  than  optimal,  and  It  may  pay  In  some  favorable  situa¬ 
tions  to  use  up  to  25  scale  divisions.  Others  believe  that 
seven  steps  or  five  Is  optimal.  Some  believe  that  five  should 
be  used  for  single  or  unipolar  (one  direction))  scales,  nine 
for  double  or  bipolar  scales.  Many  practitioners  consistently 
use  five-point  scales.  Sometimes  a  nine-point  hedonic  (plea¬ 
sure)  scale  Is  recommended  for  food  Items,  and  a  six-point 
scale  for  other  uses. 

c.  The  number  of  response  alternatives  to  use  Is  often  determined 
on  the  basis  of  the  degree  of  discrimination  required.  For 
example,  a  nine-point  scale  may  sometimes  (but  not  always)  give 
greater  discrimination  than  a  three-point  scale.  Increases  In 
reliability  tend  to  level  off  after  seven  scale  points,  and 
there  Is  no  apparent  advantage  In  using  a  large  number  of  scale 
points. 

d.  Psychologists  with  considerable  experiences  In  military  opera¬ 
tional  field  testing  feel  that  anything  more  than  five  alterna¬ 
tives  Is  too  great  a  number  for  many  Junior  enlisted  personnel 
to  discriminate  among.  More  nonresponses  are  obtained,  and  the 
discrimination  power  of  answered  Items  Is  not  Increased. 


99 


YI-G  Page  3 
8  Mar  85 
(s.  1  Jul  76) 


«.  Questionnaire  administration  time  Is  probably  a  function  of  the 
number  of  response  alternatives. 

f.  There  Is  some  evidence  that  Increasing  the  number  of  response 
alternatives  seems  to  decrease  the  number  of  nonresponses  and 
uncertain  responses  (e.g.,  "Cannot  decide"). 

g.  In  addition  to  the  response  alternatives  representing  the 
rating  scale  continuum.  It  may  be  necessary  to  add  alternatives 
such  as  "No  opinion"  or  "Old  not  experience." 

h.  Scoring  and  data  analysis  considerations  may  affect  the  selec* 
tion  of  the  number  of  response  alternatives.  If  Chi  square 
tests  are  sufficient,  two  or  three  response  alternatives  might 
be  adequate.  However,  If  two  or  three  response  alternatives 
are  used  when  nonparametrlc  rank  order  correlations  are  em* 
ployed,  substantial  "ties"  on  ranks  will  result.  If  parametric 
statistics  are  to  be  employed,  more  alternatives  are  usually 
better,  because  of  the  assumption  of  continuous  distributions 
or  Interval  scale  properties. 

1.  In  some  situations,  fully-labeled  scales  may  discriminate 
better  than  only  end-anchored  scales.  Responses  to  fully- 
labeled  scales  may  be  less  skewed  than  responses  to  only  end- 
anchored  scales. 

3.  Number  of  Response  Alternatives  with  Forced  Choice  Items 

A  number  of  different  forced  choice  Item  formats  have  been  used, 

such  as  the  following: 

a.  Two  phrases  or  statements  per  Item,  both  favorable  or  both 
unfavorable,  choose  the  more  descriptive  or  the  least  descrip¬ 
tive. 

b.  Three  statements  per  Item,  all  favorable  or  unfavorable,  choose 
the  most  and  least  descriptive  statements  In  e»ch  Item. 

c.  Four  statements  per  Item,  all  favorable,  choose  the  two  most 
descriptive  statements. 

d.  Four  statements  per  Item,  all  favorable,  choose  the  most  and 
least  descriptive  statements. 

t.  Four  statements  per  Item,  two  favorable  and  two  unfavorable, 
choose  the  most  and  least  descriptive  statements. 

f.  Five  statements  per  Item,  two  of  which  were  favorable,  one 
neutral,  and  two  unfavorable  In  appearance,  choose  the  most  and 
least  descriptive. 


100 


/ 


VI-G  Page  4 
1  Jul  76 

The  evidence  is  not  clear,  but  three  or  four  statements  per  Item 
oiay  be  preferable  to  two.  One  study  concluded  that  the  format 
described  In  “c"  above  was  superior  to  the  others.  It  was  most 
bias  resistant,  yielded  consistently  high  validities  under  various 
conditions,  had  adequate  reliability,  and  was  one  of  the  best 
received  by  respondents. 


101 


VI-H  Page  I 
8  Har  85 
(s.  1  Jul  76) 


H.  Order  of  Response  AUernatIves 


1.  General  Considerations 

The  experimental  evidence  on  the  effect  that  the  order  of  presenta¬ 
tion  of  response  alternatives  for  a  question  has  on  a  subject's 

choice  of  response  Is  Inconclusive  and  contradictory.  Varying 

conclusions  Include: 

a.  Respondents  have  a  tendency  to  select  the  first  response  alter¬ 
native  In  a  set  more  than  the  others. 

b.  With  multiple  choice  questions,  there  Is  a  tendency  to  choose 
answers  from  the  middle  of  the  list.  If  the  list  consists  of 
numbers.  Answers  were  selected  from  either  the  too  or  bottom 
of  the  list.  If  the  alternatives  were  fairly  lengthy  expres¬ 
sions  of  Ideas. 

c.  Longer  Items  produced  responses  that  were  closer  to  the  center 
of  the  response  scale.  Shorter  Items  yielded  more  positive 
responses. 

d.  Poorly  motivated  respondents  tend  to  select  the  center  or 
neutral  alternatives  with  rating  scale  Items. 

e.  Fully-labeled  response  alternatives  yielded  less  skewed  re¬ 
sponse  distributions  than  only  labeling  the  endpoints. 

f.  On  Items  about  which  respondents  feel  strongly,  the  order  of 
alternatives  makes  no  difference.  On  Items  about  which  the 
respondent  does  not  feel  strongly,  most  will  tend  to  check  the 
first  alternative, 

g.  Items  that  were  positively  worded  received  higher  mean  re¬ 
sponses  than  negatively-worded  responses. 

h.  The  positive  pole  of  rating  scale  response  alternatives  should 
be  presented  first  since  this  will  Improve  the  reliability  of 
the  responses.  However,  It  Is  Important  to  realize  that  relia¬ 
bility  may  Increase  while  validity  decreases. 

1.  Placement  of  either  the  positive  or  negative  endpoint  at  the 
left-hand  side  of  the  semantic  differential  scale  was  not 
associated  with  response  style. 

j.  Semantic  differential  scales  were  found  to  confound  trait  / 

self-descriptions  with  socially  desirable  responses  on  clinical  .  / 

Instruments.  When  a  socially  undesirable  adjective  anchor  was 
presented  first,  subjects  had  a  tendency  to  select  adjectives  ' 

which  were  opposite  In  desirability. 

Test  Item  form  biases  are  discussed  In  Section  XII-B. 


102 


VI-H  Page  2 
8  Mar  85 
(s.  1  Jul  76) 

2.  Suggested  Order  for  Multiple  Choice  Items 

The  following  suggestions  are  offered  regarding  the  order  of  multi¬ 
ple  choice  Items: 

a.  When  the  response  alternatives  have  an  Inmedlate  apparent 
logical  order  (e.g.,  they  all  relate  to  tine),  they  should  be 

,  put  In  that  order. 

b.  When  the  response  alternatives  are  numerical  values,  they 
should  In  general  be  put  In  either  ascending  or  decreasing 
order. 

c.  When  the  response  alternatives  have  no  Immediately  apparent 
logical  order,  they  should  generally  be  put  In  random  order. 

d.  Alternatives  such  as  "none  of  the  above”  or  "All  of  the  above” 
should  always  be  In  the  last  position. 

e.  Alternate  questionnaire  forms  (e.g.,  where  the  order  of  alter¬ 
natives  Is  reversed  on  half  of  the  forms)  are  often  desirable. 

f.  More  abstract  types  of  questions  minimize  order  effects  by 
developing  questions  which  are  specific  In  content  Instead  of 
general, 

3.  Suggested  Order  of  Rating  Scale  Items 

Since  rating  scales  call  for  the  assignment  of  objects  along  an 
assumed  continuum  or  In  ordered  categories  along  the  continuum,  It 
follows  that  the  response  alternatives  must  be  In  order  from  "high” 
to  "low"  or  "low"  to  "high,"  with  the  choice  of  words  for  "high" 
and  "low"  (the  endpoint  labels)  depending  upon  the  continuum  being 
used.  For  example,  for  the  continuum  satisfactory-unsatisfactory, 
item  1  In  Figure  YI-H-1  uses  the  "high"  to  "low"  order,  while  Item 
2  uses  the  order  "low"  to  "high." 


103 


Figure  VI-H-l 


VI-H  Page  3 
8  Har  85 
(s.  1  Jul  76) 


Example  of  Rating  Scale  Item 
with  Alternate  Ordering  of  Response  Alternatives 


1.  The  M16  rifle  Is; 

very  satisfactory. 

.  __  satisfactory, 
borderline. 

'  unsatl sfactory. 

.  very  unsatisfactory. 

2.  The  M16  rifle  Is: 

_____  very  unsatisfactory. 

■  unsatisfactory. 
______  borderline. 

satisfactory, 
very  satisfactory. 


Many  practitioners  use  the  "high*  to  ”lov”  order.  If  one  has 
reason  to  believe  that  the  order  of  the  response  alternatives  makes 
a  difference,  or  wishes  to  make  certain  that  they  do  not.  then  the 
use  of  alternate  questionnaire  forms  Is  recommended.  Each  alter¬ 
nate  form  should  list  the  response  alternatives  In  a  different 
order.  The  "good”  or  "high"  end  of  the  scales  should  be  at  the 
same  end  of  each  scale  for  all  Items  In  a  given  questionnaire  form, 
but  the  order  should  normally  be  reversed  on  SOX  of  the  forms.  For 
example,  the  order  shown  In  Item  1  In  Figure  VI-H-1  would  be  used 
on  half  of  the  forms;  the  order  shown  In  Item  2  on  the  other  half. 
(Normally,  there  would  be  only  two  questionnaire  forms,  one  with 
each  order,  but  at  times  alternate  forms  are  also  needed  for  other 
purposes.  Hence,  there  may  be  more  than  two.) 


104 


YII-A  Page  1 
8  Har  85 
Cs.  1  Jul  76) 


Chapter  YII;  Response  Anchoring 

A.  Overview 

This  chapter  addresses  the  "anchoring*  of  rating  scale  responses;  that 
Is,  the  words  used  to  define  some  or  all  of  the  response  alternatives. 
Section  YIl-B  shows  various  types  of  response  anchors,  while  Section 
YII*C  discusses  anchored  versus  unanchored  scales.  The  amount  of 
verbal  anchoring  Is  the  topic  of  Section  YII~D,  while  some  procedures 
for  the  selection  of  verbal  scale  anchors  are  presented  In  Section 
YII'E.  Finally,  Section  YII*F  discusses  balanced  versus  unbalanced 
scales. 

It  should  be  noted  that  Section  Y1*C  3  discussed  the  formulation  of 
response  alternatives,  while  the  number  and  order  of  response  alter¬ 
natives  are  the  topics  of  Sections  YI-G  and  YI-H,  respectively.  The 
order  of  perceived  favorableness  of  words  and  phrases  Is  discussed  In 
Chapter  YIII. 


YII-B  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


B.  Types  of  Response  Anchors 

There  are  a  niSDber  of  different  types  pf  response  anchors  that  car.  be 
used  with  rating  scale  Items.  Some  have  been  shown  as  examples  In 
jther  chapters,  such  as  Section  V1>0.  Nine  types  of  response  anchors 
are  shown  In  Figure  V1I>B-1.  The  first  shows  the  original  form  of  the 
semantic  differential.  It  Is  a  combination  graphic  and  verbal  scale. 
Respondents  were  Instructed  to  place  an  "x*  at  a  place  on  the  line  that 
would  represent  their  attitude..  The  use  of  verbal  anchors  with  a  *5 
through  -fS  numerical  continuun  is  shown  In  Item  2  of  Figure  VII-B-1. 
Itas  3  shows  verbal  anchors  used  with  a  1  through  11  ninerical  continu* 
urn.  There  Is  evidence  that  variables  studied  by  behavioral  scientists 
are  continuously  distributed,  even  though  the  measuring  Instruaents 
yield  discrete  scores.  These  scores  are  approximations  of  the  sup> 
posedly  continuous  variables.  A  combination  verbal  and  ninerical 
contlnuin  (series)  Is  shown  In  Item  4,  while  a  verbal  and  alphabetical 
contlnuin  Is  shown  In  Item  5.  Item  6  Is  similar  to  Item  5  since  It  too 
Is  a  verbal  continuum.  This  Item  lacks  the  alphabetical  and  ninerical 
response  anchors  associated  with  other  verbal  anchors.  Item  7  Is  a 
typical  Likert  rating  scale  that  calls  for  a  verbal  rating  to  a  direc¬ 
tional  statement  that  may  be  phrased  either  positively  or  negatively. 

An  example  might  be  "The  Modern  Volunteer  Army  places  too  much  emphasis 
on  extrinsic  factors  (such  as  beer  in  the  barracks)  as  opposed  to 
Intrinsic,  Job  related  factors  (such  as  pay  or  supervision).”  Item  8 
Is  constructed  on  a  continuous  scale  to  obtain  more  discrimination 
along  the  scale  line,  and  It  Is  verbally  anchored.  Item  9  Is  one 
behavioral  anchor  from  a  set  of  nine  scale  points.  This  particular 
behavioral  anchor  has  a  scale  point  of  four. 

Conflicting  empirical  evidence  exists  regarding  the  reliability  of 
scales  with  verbal  anchors  and  verbal  response  alternatives  so  that 
neither  Is  superior  to  that  of  purely  nunerical  scales.  Some  feel  that 
adding  verbal  anchors  to  a  scale  will  Increase  reliability.  Recent 
research  In  ergonomics  and  other  related  applications  Indicates  that 
cither  numerical  response  alternatives  or  verbal  response  alternatives 
are  psychometrlcally  acceptable.  If  verbal  anchors  are  used,  be  sure 
they  are  properly  developed. 


106 


VII'B  Page  2 
8  Mar  85 
(s.  1  Jul  76) 


Figure  VII-B-1 
Types  of  Response  Anchors 

1.  Combination  graphic  and  verbal  scale. 

Strong  ;  ;  ;  ;  :  :  ;  ;  Weak 

Extremely  Quite  Slight  Slight  Quite  Extremely 


LOW  HIGH 


2.  Verbal  anchors  with  a  *5  through  <*^5  numerical  continuum  (series). 

Definitely  Definitely 

dislike  like 

-5  -4  -3  *2  •!  0  ♦!  +2  +3  +4  +5 


3.  Verbal  anchors  with  a  1  through  11  numerical  continuum  (series). 

Definitely  Definitely 

dislike  like 

123456789  10  11 


4.  A  verbal  and  numerical  continuum  (series) . 

Dislike  Dislike  Dislike  Neither  Like  like  Like 
complete-  some-  a  like  nor  a  some-  complete¬ 
ly  what  little  dislike  little  what  ly 

1  2  3  4  5  6  7 


5.  A  verbal  and  alphabetical  continuum  (series). 

Well 

Below  Above  Above  Out- 

Average  Average  Average  Average  standing 

(C)  (D)  (E) 


/ 

107 


(A) 


(B) 


VII-B  Page  3 
8  Mar  85 
(s.  1  Jul  76) 


Figure  VII-B-1  (Cent.) 
Types  of  Response  Anchors 


6.  A  verbal 

continuin 

(series). 

8e1ow 

About 

A  little 

A  lot 

One  of 

None 

average 

average 

better 

better 

the  best 

better 

7.  A  verbal  continuin  (series).  (Likert  rating  scale) 

Agree  strongly  Agree  Undecided  Disagree  Disagree  strongly 

8.  Combination  verbal  and  continuous  (series)  scale. 

Attribute 

< - - - > 

negative  neutral  positive 

9.  Combination  behavioral  anchor  and  numerical  scale  point. 

Scale  Point 

4  Many  troops  In  this  unit  would  leave  the  post  as 

quickly  as  possible  after  duty  hours  to  avoid  doing 
any  extra  work. 


108 


VII-C  Page  1 
8  Mar  85 
(s.  1  Ju1  76) 


C.  Anchored  Versus  Unanchored  Scales 

A  nwber  of  studies  have  been  conducted  on  the  topic  known  as  "anchor¬ 
ing  effects."  Unfortunately,  the  research  evidence  is  contradictory  as 
to  whether  anchored  or  unanchored  scales  should  be  used.  It  has  been 
noted  that  unanchored  scales  may  well  be  anchored  by  the  questloti  stem, 
so  that  the  response  alternatives  may  not  have  to  be.  When  only  one 
end  of  a  scale  Is  anchored,  some  studies  have  found  a  tendency  for 
respondents  to  mo*»e  toward  that  extreme.  8ut  other  studies  have  found 
the  opposite  tendency.  At  least  one  study  found  that  judgment  and 
response  time  Is  decreased  with  anchoring.  In  practice,  then,  It  Is 
usually  best  to  use  anchored  scales. 


109 


VII-D  Page  1 
8  Mar  85 
(s.  1  Jul  76) 

0.  Awount  of  Verbal  Anchoring 

Obviously,  t)ie  amount  of  verbal  anchoring  of  a  rating  scale  Item  can 
vary.  It  can  be  anchored  at  the  center,  or  on  the  ends,  or  both,  or  at 
■any  points  on  the  entire  contlnuun.  There  Is  some  evidence  that  more 
descriptive  data  can  be  obtained  with  more  anchoring,  and  that  greater 
scale  reliability  Is  achieved  with  added  verbal  anchoring.  In  one 
study,  scales  labeled  at  only  the  extreme  endpoint'  •‘esulted  In  re¬ 
sponses  that  were  skewed  toward  the  positive  end  ot  Jw  scale.  Scales 
with  verbal  descriptors  for  all  response  alternatives  My  also  be 
better  predictors  of  behavior.  On  the  other  hand,  adding  examples  to 
definitions  does  not  seen  to  help  much.  (See  also  Section  VI-G  re¬ 
garding  the  nunber  of  response  alternatives  to  employ.)  Fully  labeled 
scale  points  My  encourage  a  more  balanced  response  distribution. 


no 


VH-E  Page  1 
8  Mar  85 
(S.  1  Oul  76) 

E.  Procedures  for  the  Selection  of  Verbal  Scale  Anchors 

Some  guidance  can  be  offered  regarding  the  selection  of  verbal  scale 

anchors.  See  also  Chapter  YIIX. 

1.  Pretests  for  the  selection  of  verbal  anchors  are  valuable  In  build¬ 
ing  scale  content.  Rather  than  employing  anchors  which  seem  appro¬ 
priate,  anchors  should  preferably  be  selected  by  respondents  simi¬ 
lar  to  those  who  will  be  participating  In  the  study. 

2.  Scale  endpoints  that  are  unrealistically  extreme,  such  that  few  If 
any  respondents  would  select  them,  should  be  avoided.  For  example. 
It  may  be  seldom  that  “Mever*  or  "Always"  apply.  The  use  of  "Rare¬ 
ly"  and  "Usually"  may  be  more  appropriate.  There  are  Instances, 
however,  where  extreme  statements  are  realistic.  The  decision  here 
(^f ten  requires  experience  with  what  Is  being  rated. 

3.  Analysis  of  data  Is  normally  facilitated  If  verbal  scale  anchors 
selected  for  rating  scales  are  of  equal  distance  from  each  other  In 
terms  of  scale  values.  See,  however.  Chapter  YIII. 

4.  Scales  can  be  anchored  by  examples  of  expected  behavior  based  upon 
observations  of  behavior.  There  are  a  wide  variety  of  behavioral 
scales  using  variations  of  the  Smith  and  Kendall  format.  These 
scales  use .behavioral  anchors  constructed  from  critical  Incidents. 
Procedures  for  establishing  behavioral  anchors  may  be  found  In  the 
following  references. 

a.  Bernardin,  H.  J.,  La  Shells,  H.  8.,  Smith,  P.  C.,  A  Alvares,  K. 

M.  (1976,  February).  Behavioral  expectation  scales:  Effects 
of  developmental  procedures  and  formats.  Journal  of  Applied 
Psychology.  61(1).  75-79.  - - 

b.  Borman,  U.  C.,  A  Ounnette,  H.  (1975).  Behavior-based  versus 

task-oriented  performance  ratings:  An  empirical  study.  Jour¬ 
nal  of  Applied  Psychology.  60,  561-565.  — — 

c.  Finley,  0.  N.,  Osborn,  H.  G.,  Oubln,  J.  A.,  A  Jeanneret,  P.  R. 
(1977).  Behavlorally  based  rating  scales:  Effects  of  specific 
anchors  and  disguised  scale  continue.  Personnel  Psychology. 

659-669.  - - -  - 

d.  Fivers,  G.  (1975).  The  critical  Incident  technique:  A  biblio¬ 
graphy.  JSAS  Catalog  of  Selected  Docunents  In  Psychology.  5, 
210. 

c.  Landy,  F.  J.,  A  Barnes,  J.  L.  (1979).  Scaling  behavioral 
anchors.  Applied  Psychological  Measurement.  3(2).  193-200. 

f.  Smith,  P.  C.,  A  Kendall,  L.  H.  (1963).  Retranslatlon  of  ex¬ 
pectations:  An  approach  to  the  construction  of  unambiguous 
anchors  for  rating  scales.  Journal  of  Applied  Psychology.  47. 

149-155.  - - -  -  - ^  — 


111 


VII-F  Page  1 
8  Mar  85 
(s.  1  Jul  76) 

F.  Scale  Balance.  Midpoints,  and  Polarfty 

1.  Balanced  Versus  Unbalanced  Scales 

Historically,  balanced  scales  have  been  preferred  by  researchers. 

A  scale  Is  balanced  when  It  has  a  nuaber  of  positive  response 
alternatives  equal  to  the  ninber  of  negative  alternatives,  regard¬ 
less  of  the  presence  or  absence  of  an  ‘Indifferent,"  neutral,  or 
■Id-scale  category.  A  "Don't  know"  response  alternative.  If  pre¬ 
sent,  Is  not  considered  to  be  part  of  the  scale,  so  Is  not  counted 
when  deciding  If  the  scale  Is  balanced.  See  the  exanples  of  bal¬ 
anced  and  unbalanced  scales  In  Figure  VII-F-1.  Unbalanced  scales 
■ay  be  eaployed  If  pretest  results  Indicate  that  ■any  respondents 
will  be  choosing  extreae  response  alternatives  at  one  end  of  a 
scale,  producing  a  skewed  distribution  of  responses  rather  than  the 
statistically  expected  noraal  distribution  around  the  aean  atti¬ 
tude.  To  reduce  the  piling  up  of  responses  at  one  end  of  a  scale, 

-  or,  to  add  to  your  ability  to  discrialnate  anong  responses  In 
that  region  -  the  scale  Is  aade  unbalanced  by  adding  aore  response 
alternatives  on  the  side  of  the  scale  where  the  piling  Is  likely  to 
occur.  This  practice  tends  to  spread  the  distribution  of  responses 
aore  evenly  along  the  scale  contlnuia. 

In  cases  where  one  has  no  advance  Inforaatlon  or  other  basis  for 
expecting  responses  to  be  largely  one-sided.  It  Is  normally  de¬ 
sirable  to  have  an  equal  number  of  positive  and  negative  response 
alternatives;  I.e.,  a  balanced  scale. 

2.  Midpoints 


Scales  may  or  may  not  Include  a  midpoint  or  aid-scale  response 
alternative.  This  does  not  affect  their  classification,  but  does 
affect  their  response  distributions.  There  Is  research  evidence 
that  when  a  alddle  position  Is  offered  on  a  scale,  there  Is  a  shift 
In  the  distribution  of  ratings.  Up  to  10-20S  or  aore  of  the  rat¬ 
ings  may  shift  Into  the  midpoint  causing  a  decline  In  the  polar 
positions.  Even  so,  questionnaires  that  have  response  distribu¬ 
tions  that  Include  a  aldpolnt  yield  similar  distributions  to  those 
without  the  midpoint.  The  Inclusion  or  exclusion  of  a  midpoint 
probably  won't  Influence  the  response  distribution  that  much  as 
long  as  there  are  at  least  five  scale  points. 

As  examples.  Items  Ic,  2a,  and  3  In  Figure  VII-F-l  show  scales  with 
no  aid-scale  point.  One  might  exclude  the  aid-scale  point  for 
Items  where  It  Is  Judged  that  respondents  ought  to  have  a  suffi¬ 
cient  basis  for  being  pro  or  con,  and  where  one  desires  to  force 
respondents  away  froa  an  "on  the  fence"  position.  Bipolar  scales 
should  be  balanced  In  terms  of  the  degree  of  extreaeness  denoted  by 
the  endpoint  anchors.  For  example.  If  "Never"  Is  used,  then  "Al¬ 
ways"  should  be  used  as  the  opposite  endpoint. 


112 


VII-F  Page  2 
8  Her  85 
(s.  1  Jul  76) 


Figure  VII-F-1 

Examples  of  Scale  Balance,  Midpoints, 

and  Polarity 

I.  Balanced  bipolar  scales. 

a.  Very  progressive  b. 

Effective 

Progressive 

Fairly  effective 

Moderately  progressive 

Borderline 

Neither  progressive  nor  conservative 

Fairly  ineffective 

Conservative 

Ineffective 

Very  conservative 

d. 

Very  satisfied 

c.  Very  effective 

Satisfied 

Somewhat  effective 

Borderline 

Somewhat  ineffective 

Dissatisfied 

Very  Ineffective 

Very  dissatisfied 

2.  Unbalanced  bipolar  scales. 

a.  Enthusiastic  b. 

Quite  good 

Extremely  favorable 

Rather  good 

Very  favorable 

Somewhat  poor 

Favorable 

Rather  poor 

Fair 

Quite  poor 

Poor 

Very  poor 

3.  Unbalanced  Scale  (unipolar). 

Very  much 

Much 

Some 

A  little 

None 

3.  Polarity 


Seales  nay  be  bipolar  or  unipolar.  Item  3  in  Figure  YII-F-1  illus¬ 
trates  a  unipolar  scale.  Its  basic  feature  is  that  it  represents 
the  thing  being  assessed  as  naving  from  none  to  a  maximum  -  with  n 
steps  in  between  -  of  some  property.  The  question  of  balance  only 
arises  for  bipolar  scales.  Many  a  bipolar  scale  could  be  rede¬ 
signed  as  a  unipolar  scale.  Instead  of  item  Ic  in  Figure  VII-F-1, 
one's  question  about  effectiveness  (not  given)  could  have  been 
followed  by  this  unipolar  scale  of  effectiveness:  maximum  effec¬ 
tiveness,  great  effectiveness,  moderate  effectiveness,  slight 
effectiveness,  and  no  effectiveness. 

Semantic  preferences  may  determine  whether  the  questionnaire  writer 
uses  bipolar  or  unipolar  scales. 


YIII-A  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


Chapter  yill;  Empirical  Bases  for  Selecting 
Modifiers  for  Response  Alternatives 


A.  Overview 


When  constructing  a  questionnaire,  it  Is  often  necessary  to  select 
adjectives,  adverbs,  or  adjective  phrases  to  use  as  response  alterna¬ 
tives.  The  words  selected  for  response  alternatives  should  be  clearly 
understood  by  the  respondents  to  the  questionnaire,  and  they  should 
have  precise  aeanlng.  There  should  be  no  confusion  aaong  respondents 
as  to  whether  one  tern  denotes  a  higher  degree  of  favorableness  or 
unfavorableness  than  another. 

There  Is  no  need  to  guess  which  phrases  or  words  are  the  best  to  use  as 
response  alternatives.  Many  studies  have  been  conducted  in  order  to 
determine  the  perceived  favorableness  of  coanonly  used  words  and 
phrases.  These  studies  have  detemlned  scale  values  and  variances  for 
words  and  phrases  which  can  be  used  to  order  the  response  alternatives. 
In  some  of  the  studies,  ambiguous  words  and  words  that  are  not  appro¬ 
priate  to  use  as  response  alternatives  have  been  Identified. 

The  results  of  these  studies  and  the  experience  of  questionnaire  de¬ 
signers  have  been  Incorporated  Into  this  chapter  In  order  to  offer 
guidelines  and  suggestions  to  be  used  In  selecting  response  alterna¬ 
tives.  This  chapter  Includes  lists  of  words  and  procedures  to  use  In 
selecting  response  alternatives.  Many  lists  of  phrases  with  mean  scale 
values  and  standard  deviations  are  presented.  The  scale  values  are 
given  for  the  purpose  of  selecting  response  alternatives,  not  for  the 
purpose  of  assigning  scale  values  to  response  alternatives  for  data 
analysis  purposes. 

Section  YIII-B  discusses  things  to  consider  In  selecting  response 
alternatives;  Section  VIII-C  covers  the  selection  of  response  alterna¬ 
tives  denoting  degrees  of  frequency;  Section  VIII-D,  the  selection  of 
response  alternatives  using  order  of  merit  lists  of  descriptor  terms; 
Section  YIII-E,  the  selection  of  response  alternatives  using  scale 
values  and  standard  deviations.  Section  YIII-F  Includes  sample  sets  of 
response  alternatives. 

Scale  values,  standard  deviations,  and  Interquartile  ranges  reported  In 
this  chapter  have  been  taken  from  data  presented  In  the  following 
studies: 

1.  Altemeyer,  R.  A.  (1970).  Adverbs  and  Intervals:  A  study  of  Likert 
scales.  Proceedings  of  the  Annual  Convention  of  the  American 
Psychological  Association.  S(pt.  i).  397-398. 


115 


YIII-A  Page  2 
8  Mar  85 
(s.  1  Jul  76) 

2.  Backstrom,  C.  H..  1  Hurchur-Cesar,  6.  (1981).  Survey  research. 

New  York.  NY:  John  Wiley  8  Sons. 

3.  Beltramlnl.  R.  F.  (1982).  Rating-scale  variations  and  dlscrlml- 
nablllty.  Psychological  Reports.  SO.  299-302. 

4.  Bendig,  A.  H.  (1953).  The  reliability  of  self-ratings  as  a  func¬ 
tion  of  the  anount  of  verbal  anchoring  and  the  nmber  of  categories 
on  the  scale.  Journal  of  Applied  Psychology.  37.  38-41. 

5.  Boote.  A.  S.  (1981).  Reliability  testing  of  psychographic  scales. 
Journal  of  Advertising  Research.  21( 5 ) ,  53-60. 

6.  Cliff.  N.  (1959).  Adverbs  as  Multipliers.  Psychological  Review. 
66,  27-44. 

7.  Oodd.  S.  C..  t  Gerberick.  T.  R.  (1960).  Word  scales  for  degrees  of 
opinion.  Language  and  Speech.  2»  18-31. 

8.  Dolch.  N.  A.  (1980).  Attitude  neasurefoent  by  senantic  differential 
on  a  bipolar  scale.  The  Journal  of  Psychology.  IPS.  151-154. 

9.  Gividen,  G.  M.  (1973,  February).  Order  of  merl t-  descriptive 
phrases  for  questionnaires.  Unpublished  report,  avallabU  from  the 
ARI  Field  Unit  at  Fort  Hood,  TX. 

10.  Innes,  J.  H.  (1977).  Extremity  and  'don't  know”  sets  In  question¬ 
naire  response.  British  Journal  of  Social  Clinical  Psychology.  16. 
9-12. 

11.  Ivancevich.  J.  M.  (1980).  Behavioral  expectation  scales  versus 
nonanchored  and  trait  rating  systems:  A  sales  personnel  applica¬ 
tion.  Applied  Psychological  Measurement.  £(1),  131-133. 

12.  Jones,  L.  Y.,  I  Thurstone,'L.  L.  (1955).-  The  psychophysics  of 
semantics:  An  experimental  Investigation.  Journal  of  Applied 
Psychology.  39,  31-36. 

13.  Mathews,  J.  L.,  Wright,  C.  E.,  8  Yudowitch,  K.  (1975,  March). 
Analysis  of  the  results  of  the  admlnl station  of  three  sets  of 
descriptive  phrases.  Pato  Alto,  tk:  Operations  Research  Asso¬ 
ciates. 

14.  Mathews,  J.  L.,  Wright,  C.  E.,  Yudowitch,  K.  L.,  Geddle,  J.  C.,  8 
Palmer,  R.  L.  (1978,  August).  The  perceived  favorableness  of 
selected  scale  anchors  and  response  alternatives  (Technical  leaner 
3l9).  Palo  Alto,  CA:  Operations  Research  Associates,  and  Alexan¬ 
dria,  YA:  U.S.  Army  Research  Institute  for  the  Behavioral  and 
Social  Sciences.  (OTIC  No.  AD  A061755} 

15.  Menezes,  0.,  8  Elbert,  N.  F.  (1979).  Alternative  semantic  scaling 
formats  for  measuring  store  Image:  An  evaluation.  Journal  of 
Marketing  Research,  16(1),  80-87. 


116 


YIII-A  Page  3 
8  Mar  85 
(s.  1  Jul  76) 


16.  Hosier,  C.  1.  (1941).  A  psychometric  study  of  leaning.  Journal  of 
Social  Psychology.  13,  123-140. 

17.  Myers,  J.  H.,  4  Warner,  W.  6.  (1968).  Seaantic  properties  of 
selected  evaluation  adjectives.  Journal  of  Marketing  Research. 

409-412.  —  - 

18.  Presser,  S.,  t  Schunan,  H.  (1980).  The  aeasureoent  of  a  ilddle 
position  In  attitude  surveys.  Public  Opinion  Quarterly,  44(1), 
70-85. 

19.  Reynolds.  T.  J.,  I  Jolly,  J.  P.  (1980).  Measuring  personal  values: 

An  evaluation  of  alternative  lethods.  Journal  of  Marketing  Re¬ 
search,  J^,  531-536.  - 

20.  Schinan,  H.,  4  Presser,  S.  (1981).  Questions  and  answers  In  atti¬ 
tude  surveys:  Experiments  on  Question  form,  wording,  and  context. 
New  Vork:  Academic  Press,  Inc. 


21.  U.S.  Anay  Test  and  Evaluation  Coassand  (1973).  Development  of  a 
guide  and  checklist  for  human  factors  evaluation  of  Army  equipment 


VIII-B  PagI  1 
1  Jul  76  I 

B.  BeneraT  Considerations  In  the  Selection  of  Response  Alternatives 

There  are  several  ways  of  selecting  response  alternatives.  These  ways 
are  dependent  on  the  purpose  of  the  questionnaires  and/or  on  the  way 
the  data  will  be  analyzed.  There  are  specific  considerations  when  ^ 
selecting  response  alternatives  for  balanced  scales,  when  selecting 
response  alternatives  with  extreme  values,  and  when  developing  equal 
Interval  scales.  There  are  also  general  things  to  consider  In  the 
selection  of  any  response  alternative. 

In  some  eases.  It  Is  desirable  to  select  response  alternatives  on  more 
than  one  basis.  For  example,  mutually  exclusive  phrases  may  be  se¬ 
lected  also  on  the  basis  of  parallel  wording. 


Matchinc 


luestlon  Stem 


Descriptors  should  be  selected  to  follow  the  question  stem.  For 
example.  If  the  stem  asks  for  degrees  of  usefulness,  descriptors 
such  as  *Very  useful"  and  "Of  significant  use"  should  be  used.  In 
some  eases,  this  way  mean  rewording  the  question  stem  so  that 
appropriate  response  alternatives  can  be  selected. 


2.  Mixing  Descriptors 


Descriptors  on  different  contlnuins  should  usually  not  be  mixed. 

For  example,  "Average"  should  never  be  used  with  quantitative  terms 
or  qualitative  terms  such  as  "Excellent"  or  "Good*'  (since  "average" 
performance  for  a  group  may  very  well  be  excellent  or  good  or  even 
poor).  If  the  descriptors  are  selected  for  use  with  a  question 
stem  asking  about  satisfactory  or  unsatisfactory,  the  word  "Satis¬ 
factory"  or  "Unsatisfactory"  (or  a  synonym)  should  normally  be  In 
every  response  alternative,  except  perhaps  fer  a  neutral  response 
alternative. 


Some  experts  go  as  far  as  to  say  that  the  wording  of  the  response 
alternatives  should  be  parallel  for  balanced  scales.  For  example, 
If  the  phrase  "Strongly  agree"  Is  used,  then  the  phrase  "Strongly 
disagree"  should  also  be  used.  By  reviewing  some  of  the  studies 
that  have  determined  scale  values  for  descriptors.  It  can  be  seen 
that  some  pairs  of  parallel  phrases  are  not  equally  distant  from  a 
neutral  point  or  from  other  phrases  In  terms  of  their  scale  values. 
Hence,  parallel  wording  may  not  always  provide  equally  distant  pro 
and  eon  response  alternatives,  although  they  may  be  perceived  as 
symmetrical  opposites. 


113 


VIII-B  Page  2 
1  Jut  76 

Using  descriptors  from  one  continuum  or  descriptors  with  parallel 
ifording  for  a  given  questionnaire  Item  has  advantages.  The  advan¬ 
tages  are  that  the  response  alternatives  will  usually  fit  the  stem  ' 
better,  and  they  will  be  parallel  to  each  other  In  sManlng  and 
appearance. 

3.  Selecting  Response  Alternatives  with  Clear  Meaning 

Sone  words  are  difficult  for  respondents  to  use  In  answering  ques¬ 
tions.  This  difficulty  may  be  the  result  of  the  respondent  being 
Ignorant  of  the  meaning  of  the  word,  or  not  being  able  to  rate  the 
word  In  terns  of  degrees  on  specific  scales.  Such  words  should  not 
be  used  as  response  alternatives.  Some  studies  asked  the  respon¬ 
dents  to  indicate  which  words  they  were  unable  to  rate.  Table 
VIII-B-1  lists  examples  of  words  that  were  unrateable  by  subjects. 


Table  VIII-B-l 


Words  Considered  Unrateable  by  Subjects 


Phrase 

Phrase 

Adverse 

Noxious 

Appalling 

Peerless 

Base 

Satiating 

Despicable 

Seemly 

Expedient 

Fit 

Superlative 

From:  Hosier  1941a. 

Some  words  appear  to  have  two  or  more  distinct  meanings.  When 
these  words  are  rated  on  a  contlnutn  of  favorableness-unfavorable- 
ness,  many  respondents  will  mark  one  part  of  the  scale,  while  the 
other  respondents  will  nark  a  different  place  on  the  scale.  This 
depends  on  how  they  Interpret  the  words,  and  may  result  In  a  blmo- 
dal  distribution.  Such  words  also  should  not  be  used  as  response 
alternatives.  A  list  of  words  evoking  blnodallty  of  response  Is 
given  In  Table  VlII-B-2. 


119 


Vin-B  Page  3 
1  Jul  76 


Table  YIII-B-2 

Mords  Evoking  Blmxiall^  of  Response 


4 


Word(s) 


Mord(s) 


Acceptable 

Amazing 

Bearable 

Con^letely  Indifferent 
Extremely  Indifferent 
Highly  Indifferent 
Important 
Indifferent 
Indispensable 


Irresi stable 
Normal 
Tempting 
Unfit 

Unspeakable 
Unusually  Indifferent 
Very  Indifferent 
Very,  very  Indifferent 


From:  Hosier  1941a 

4.  Selecting  Nonamblguous  Terms/Descriptors 

Some  descriptors  are  more  ambiguous  than  others.  The  more  ambigu¬ 
ous  the  descriptor,  the  more  varied  the  respondents*  Interpreta¬ 
tions  of  the  degree  of  favorableness  denoted  by  the  descriptor. 

The  ambiguousness  of  a  descriptor  Is  measured  by  the  variability  of 
respcnses  given  to  the  Item.  One  measure  of  varlablll^  Is  the 
standard  deviation.  When  available,  standard  deviations  (SO)  are 
given  with  scale  values  In  this  chapter.  Another  measure  used  to 
show  variability  Is  the  Interquartile  range.  This  measure  Is 
Indicated  In  this  chapter  with  scale  values  only  when  the  standard 
deviations  were  unavailable. 


It  Is  most  desirable  to  select  terms  with  small  ranges  or  small 
standard  deviations,  as  they  will  have  less  ambiguous  meaning  to 
respondents.  Also,  selecting  a  term  with  a  small  standard  devia¬ 
tion  decreases  the  chances  of  the  meaning  of  the  term  overlapping 
with  the  meaning  of  neighboring  terms. 

5.  Selecting  Response  Alternatives 

When  balanced  scales  with  two,  three,  four,  or  five  descriptors  are 
sufficient  for  describing  the  distribution  of  respondents'  atti¬ 
tudes  or  evaluations,  the  questionnaire  writer  can  compose  them 

?u1te  satisfactorily  by  using  a  term  and  Its  literal  opposite 
effective  vs.  Ineffective;  pleasing  vs.  unpleasing)  for  two  of  the 
terms.  A  more  extreme  pair  can  be  produced  by  using  "Very”  to 
modify  these  two  terms. 


120 


YIII-B  Page  4 
8  Har  85 
(s.  1  Jul  76) 

The  first  of  several  Intended  studies  of  how  people  rate/order 
terns  that  night  be  used  for  rating  scale  descriptors  was  conducted 
by  Operations  Research  Associates  and  ARI  just  prior  to  the  writing 
of  this  nanual.  Its  results  nay  assist  questionnaire  developers 
who  need  unbalanced  scales  or  scales  with  nore  than  five  descrip* 
tors.  In  the  study,  each  of  100  Amy  personnel  was  asked  to  assign 
a  scale  value  ranging  fron  *5  (nost  negative)  to  tS  (most  positive) 
to  each  tern  In  three  different  sets  of  terns,  totaling  over  100 
descriptors. 

Tables  VI  1 1*8*3  and  V1II*B*4  give  samples  of  descriptors  fron  this 
study  for  which  nean  scale  values  and  standard  deviations  have  been 
calculated.  The  list  In  Table  YIII*B*3  was  derived  by  first  se* 
lecting  the  descriptor  with  the  largest  positive  nean.  Tbe  next 
descriptor  selected  has  a  mean  that  Is  at  least  one  standard  devla* 
tion  lower.  The  Inpllcatlon  of  the  gap  of  one  standard  deviation 
Is  that  not  nore  than  16S  of  the  people  would  have  assigned  a  lower 
scale  value  to  the  first  descriptor  than  they  did  to  the  second 
descriptor,  and  vice  versa.  To  this  extent,  the  raters  disagreed 
on  the  ordering  of  these  two  terns  when  rating  about  50.  The  third 
descriptor  on  the  list  has  a  nean  scale  value  yet  another  standard 
deviation  lower.  This  process  was  repeated  until  the  descriptor 
with  the  lowest  nean  scale  value  was  selected.  A  descriptor  was 
not  used  If  Its  standard  deviation  was  greater  than  1.000. 

The  list  on  Table  YII1*B*4  was  constructed  again  by  skipping  at 
least  one  standard  deviation  between  adjacent  terns;  however,  the 
starting  point  was  at  the  niddle,  with  the  word  "Neutral.” 

Use  of  Table  yiII*B*3  as  a  10*descr1ptor  unbalanced  scale  Is  not 
highly  recomaended.  If  one  wanted  a  n1ne*descr1ptor  scale,  one 
could  use  the  four  adverbs  appearing  In  front  of  "Acceptable*  In 
the  table  In  that  same  location,  and  also  use  them  In  front  of 
"Unacceptable*  In  reverse  order  create  a  semantically  balanced 
and  ordered  scale.  Or,  one  could  use  the  five  adverbs,  now  shown 
below  "Neutral,”  both  above  and  below  "Neutral*  to  create  an  11* 
descriptor  scale  of  acceptability  (or  effectiveness,  or  satisfac* 
tori ness,  etc.).  "Neutral,"  however,  may  not  be  a  suitable  mid* 
point  tern  here  as  the  respondent  who  has  neutral  feelings  (I.e., 
does  not  know  or  does  not  care)  might  check  this  response,  whereas 
the  tern  "Neutral"  Is  Intended  to  specify,  for  example,  a  midpoint 
between  "barely  acceptable*  and  "barely  unacceptable." 


121 


¥in-B  Page  5 
8  Har  8S 
(s.  1  Jul  76) 

Table  VIII-8-3 


Sample  List  of  Phrases 
Denoting  Degrees  of  Acceptability 


Phrase 

Mean 

SO 

Wholly  acceptable 

4.725 

.563 

Highly  acceptable 

4.040 

.631 

Reasonably  acceptable 

2.294 

.722 

Barely  acceptable 

1.078 

.518 

Neutral 

.000 

.000 

Barely  unacceptable 

•  -1.100 

.300 

Rather  unacceptable 

-2.020 

.836 

Substantially  unacceptable 

-3.235 

.899 

Highly  unacceptable 

-4.220 

.576 

Completely  unacceptable 

-4.900 

.361 

From:  Mathews,  Wright,  and  Yudowitch  (1975). 
See  Section  YIII-A  13. 


Table  VIII-B-4 

A  Second  Sample  List  of  Phrases 
Denoting  Degrees  of  Acceptability 


Phrase 

Mean 

SO 

Yery,  very  acceptable 

4.157 

.825 

Largely  acceptable 

3.137 

.991  . 

Mildly  acceptable 

1.5S6 

.700 

Sort  of  acceptable 

.940 

.645 

Neutral 

.000 

.000 

Barely  unacceptable 

-1.100 

.300 

Rather  unacceptable 

-2.020 

.836 

Substantially  unacceptable 

-3.235 

.899 

Highly  unacceptable 

-4.294 

.535 

Completely  unacceptable 

-4.900 

.361 

From:  Mathews,  Wright,  and  Yudowitch  (1975). 
See  Section  YIII-A  13. 


122 


VIII-B  Page  6 
8  Mar  85 
(s.  1  Jul  76} 

While  the  scale  values  froa  the  studies  cited  are  useful,  further 
refinement  Is  possible.  That  Is.  once  having  selected  a  candidate 
scale  (set  of  descriptors),  one  could  then  conduct  another  study  to 
detenalne  If  relevant  judges  would  assign  scale  values  Indicating 
equal  Intervals  (anong  aeans)  for  the  terns  on  the  candidate  scale. 

6.  Selecting  Descriptors  for  Endpoints 

Once  the  decision  has  been  aade  as  to  how  extreae  the  endpoints  of 
a  scale  should  be  (see  Section  VII*C  4).  the  descriptors  should  be 
selected  accordingly.  If  extreme  endpoints  are  desired.  descr1p> 
tors  that  have  extreme  meanings  should  be  selected.  One  guideline 
that  can  be  used  In  selecting  these  descriptors  Is  to  use  those 
that  have  the  highest  end  lowest  scale  values.  Another  guideline 
Is  to  review  the  descriptors  In  terms  of  their  apparent  meanings. 

If  less  extreme  endpoints  are  desired,  descriptors  that  do  not  have 
extreme  scale  values  and  that  do  not  have  the  apparent  extreme 
meanings  should  be  selected. 

There  has  been  conflict  about  whether  fullylabeled  scales  are 
psychometrlcally  superior  to  scales  labeled  only  at  the  endpoints. 
Some  evidence  supports  fullylabeled  scale  points  which  appear  to 
produce  response  distributions  that  are  less  skewed. 

7.  Selecting  Midpoint  Responses 

Whether  a  middle  response  alternative  Is  included  or  excluded  on  a 
scale  won't  make  that  much  difference  as  long  as  the  nuaber  of 
scale  points  Is  at  least  five  and  not  more  than  eleven.  For  re* 
spondents  who  have  a  weak  opinion,  ellalnating  the  middle  alter¬ 
native  will  force  a  response  toward  either  end  of  a  bipolar  scale. 
Including  the  middle  alternative  may  Increase  differentiation  of 
response,  and  aay  be  useful  for  Individuals  who  have  a  strong  opin¬ 
ion  on  the  topic.  Overall,  response  distributions  for  scales  that 
Include  the  middle  alternative  look  about  the  same  as  the  distribu¬ 
tions  without  the  middle  response  alternative.  The  decline  In 
responses  to  endpoints  of  the  scale  accounts  for  the  shift  In 
response  when  a  middle  alternative  Is  offered. 

Identification  and  selection  of  the  label  for  the  middle  alterna¬ 
tive  Is  dependent  on  the  overall  selection  of  the  other  response 
alternatives.  Different  populations  will  perceive  response  alter¬ 
natives  with  divergent  perceptions.  The  means  and  variances  for 
agreement  on  the  semantic  meaning  of  response  alternatives  will 
vary  by  population.  To  Identify  appropriate  response  alternatives, 
a  sample  from  the  target  population  could  rate  response  alterna¬ 
tives  for  agreement  for  semantic  meaning.  Rating  response  alter¬ 
natives  by  themselves  may  produce  different  results  than  rating 
response  alternatives  In  conjunction  with  the  Item  stem. 

In  selecting  a  descriptor  for  a  midpoint  response.  It  Is  necessary 
to  use  a  descriptor  that  Is  neutral  (neither  positive  nor  negative) 
In  meaning.  Some  of  the  commonly  used  midpoints  do  not  appear  as 
neutral  as  might  be  expected  to  some  respondents. 


123 


VIII-B  Page  7 
8  Mar  85 
($.  1  Jul  76) 

Table  YIII-B-S  lists  several  candidate  Midpoint  terms  with  their 
scale  values  and  standard  deviations.  This  list  may  be  helpful  In 
selecting  midpoint  responses. 


Table  YIII-B-5 

Candidate  Midpoint  Terms'  Scale  Values  and  Standard 
Deviations  as  Determined  by  Several  Different  Studies 


Term 

Mean 

Scale 

Value 

SO 

Theoretical 

Middle 

Scale  Value 

About  average 

3.77 

.85 

3.50 

Acceptable 

.73 

.66 

.00 

Acceptable 

11.12 

2.59 

10.00 

Acceptable 

2.39 

1.46 

.00 

All  right 

10.76 

1.42 

10.00 

Average 

3.08 

— 

3.;''C 

Average 

.86 

1.08 

.00 

Average 

10.84 

1.55 

10.00 

Borderline 

-.02 

.32 

.00 

Borderl 1 ne 

.00 

.20 

.00 

Borderline  - 

_ -.06 

.31, 

.00 

Doesn't  make  any  difference 

2.83 

3.73* 

5.00 

Don't  know 

4.82 

.82* 

5.00 

Fair 

6.S0 

5.S0 

Fair 

.78 

.85 

.00 

Fair 

9.S2 

2.06, 

10.00 

Fair 

4.96 

.77* 

5.00 

Neutral 

.00 

.00 

.00 

Neutral 

.02 

.18 

.00 

Neutral 

9.80 

1.50 

10.00 

Neutral 

10.18 

2.01 

10.00 

Normal 

6.70 

1.43 

6.00 

Ordinary 

6.S0 

1.43 

6.00 

O.K. 

.87 

1.24 

.00 

O.K. 

10.28 

1.67 

10.00 

So-so 

10.08 

1.87, 

10.00 

Undecided 

4.76 

3.73* 

5.00 

^Interquartile  range  shown  rather  than  the  standard  deviation 


124 


¥Iir-B  Page  8 
8  Har  85 
(a.  1  Jul  76) 

Words  coononly  used  for  aldpolnt  responses  are  discussed  below: 

a.  Average. 

"Average”  should  never  be  used  In  conjunction  with  adjectives 
such  as  "Excellent."  "Good,"  etc.  "Average"  has  no  meaning 
when  used  with  these  words.  For  example.  "Average"  performance 
■ay  be  superior  or  It  nay  be  completely  unsatisfactory.  Fur- 
theraore.  most  evaluators  do  not  have  the  experience  or  compe¬ 
tence  to  even  know  what  an  "Average"  perforwce  Is.  Typical¬ 
ly.  when  "Average"  Is  used  on  a  field  test  evaluation  fora, 
only  5Z  or  lOX  of  responders  rate  the  subject  as  below  average 
and  SOS  or  401  rate  it  above  average.  Tiie  data  froa  such  a 

?uest1on  Indicate  that  the  response  alternatives  ire  not  well 
oraulated.  Therefore,  as  a  general  rule.  It  is  usually  1n- 
approprlatee  to  use  any  tens  of  "Average"  In  a  questionnaire, 
and  It  Is  always  Inappropriate  to  use  ^Average"  In  conjunction 
with  phrases  such  as  ‘‘Excellent."  "Good."  "Poor,"  etc. 

If  "Average"  Is  used.  It  should  be  with  extreme  care  and  only 
when  one  Is  Interested  In  comparing  perfonsances  or  Items  with 
each  other.  .  It  should  not  be  used  when  one  desires  to  find  out 
how  "good"  or  how  "bad"  an  Item  or  perfcmance  Is.  Signifi¬ 
cantly  above  average  performance  aay  be  extreaely  unsatisfac¬ 
tory. 

b.  Mo  opinion. 

"No  opinion"  Is  unacceptable  as  a  mid-scale  term,  as  It  usually 
denotes  that  a  person  has  no  opinion  due  to  lack  of  knowledge 
or  due  to  not  having  thought  about  an  Issue.  "No  opinion"  can 
be  used  as  a  response  alternative  If  It  represents  a  specific 
^pe  of  Information  that  Is  wanted. 

c.  Neutral. 

"Neutral"  Is  considered  as  a  less  desirable  aid-scale  term  to 
use  than  "Borderline."  Although  every  respondent  In  the  study 
gave  the  term  zero,  the  meaning  on  a  questionnaire  Is  not  clear 
(see  page  YIIl-B  4).  Two  out  of  52  respondents  Indicated  It 
was  unratcable.  In  another  study.  "Neutral"  had  a  mean  scale 
value  of  .02  and  a  standard  deviation  of  .18.  Because  of  the 
ambiguity  of  meaning  of  "Neutral"  (e.g..  feeling  of  the  respon¬ 
dent  versus  midpoint  alternative).  It  Is  not  recommended  that 
It  be  used  as  midpoint  on  most  questionnaires. 

d.  Marginal. 

"Marginal”  Is  sometimes  used  as  a  midpoint  response  alterna¬ 
tive.  Interviews  with  test  subjects  Indicated  that  the  term 
"Marginal"  In  most  cases  had  a  meaning  of  above  "Bordorllne"  or 
still  satisfactory,  but  very  close  to  being  unsatisfactory. 
Hence.  Indications  are  that  there  may  be  more  desirable  terms 
to  use  than  "Marginal.” 


125 


YIII-B  Page  9 
8  Mar  85 
is.  1  Jul  76) 


e.  Borderline. 

"Borderline"  Is  preferred  by  some  experts  as  a  midpoint  re¬ 
sponse.  In  an  administration  to  Fort  Hood  soldiers  of  over 
1,500  questionnaires  using  the  term  "Borderline"  as  a  midpoint, 
there  was  not  one  Instance  of  reported  confusion  among  those 
completing  the  questionnaires.  However,  there  were  times  when 
"Borderline"  had  a  larger  standard  deviation  than  "Neutral." 
(Again,  "Neutral"  by  definition  Implies  zero  to  most  persons, 
but  Its  frame  of  reference  Is  ambiguous). 

f.  Uncertain. 

"Uncertain"  Is  unacceptable  as  a  midpoint  tens,  as  It  implies 
that  witii  additional  knowledge  or  thought  a  decision  could  be 
made  that  would  fall  Into  one  of  the  other  categories. 

g.  Undecided. 

"Undecided"  Is  also  unacceptable  as  a  mid-scale  term  for  the 
same  reasons  as  "Uncertain." 

h.  Neither  agree  nor  disagree. 

"Neither  agree  nor  disagree"  and  similar  descriptors  written  In 
this  form  may  be  used  as  midpoint  responses.  They  have  the 
advantage  of  paralleling  the  rest  of  the  descriptors  In  the 
set,  and  they  denote  a  position  exactly  In  the  middle  of  the 
endpoints.  This  term,  like  "Neutral,"  can  also  Imply  uncer¬ 
tainty,  Indecision,  or  &  lack  of  knowledge  rather  than  a  firm 
knowledge  that  It  represents  a  midpoint. 

I.  No  effect. 

"No  effect*  may  be  employed  as  a  midpoint  term  when  It  Is  used 
with  a  set  of  descriptors  to  measure  the  type  of  effect  that  an 
activity  will  have.  For  Instance,  It  can  be  used  on  a  con¬ 
tinuum  from  beneficial  to  detrimental. 

J.  Ordinary. 

"Ordinary"  should  not  be  used  as  a  mid-scale  term.  In  one 
study,  use  of  the  term  "Ordinary*  as  the  mid-scale  value  re¬ 
sulted  In  a  marked  skewing  of  responses  at  the  low  end  of  the 
scale.  This  resulted  from  the  common  use  of  "Ordinary"  to 
Imply  Inferiority. 

k.  Fair. 

"Fair*  should  not  be  used  as  a  mid-scale  term,  in  one  study, 
the  median  scale  value  for  "Fair”  was  a  full  point  above  the 
mid-scale  point.  It  appears  for  some  subjects  that  the  meaning 
of  "Fair"  Is  distinctly  favorable. 


\ 


/ 


126 


VIII-B  Page  10 
8  Mar  85 
(s.  1  Jul  76) 

1.  Acceptable. 

'Acceptable*  Is  not  a  desirable  word  to  use  as  a  nid-scale 
Itcn.  In  one  study  It  exhibited  a  aarlced  blmodallty  of  re¬ 
sponse.  Indicating  that  subjects  disagreed  on  the  degree  of 
favorableness  noted  by  the  tens.  In  a  recent  study.  "Accept¬ 
able”  had  a  large  standard  deviation  of  1.46. 

■  Normal. 

'Normal*  Is  not  a  desirable  word  to  use  as  a  mid-scale  Item. 

In  one  study  It  exhibited  a  marked  bimodallty  of  response. 
Indicating  that  the  word  'Normal*  has  different  meanings  for 
different  subjects.  This  term  would  be  classified  as  a  synonym 
for  'Average.* 

n.  Medium. 

'Medium*  may  possibly  be  used  as  a  midpoint  term.  In  one  study 
there  was  a  piling  up  of  Judgments  for  *Med1un*  at  the  middle 
scale  position. 

0.  u.K.  or  all  right. 

*0.K.*  or  'All  right*  have  been  used  sometimes  as  midpoint 
response  alternatives.  However,  they  have  a  tendency  to  be 
rated  somewhat  positively.  They  also  have  larger  standard 
deviations  than  other  terms  mentioned.  Indicating  that  there  Is 
ambiguity  In  their  meaning. 

p.  So-so. 

*So-so*  Is  another  term  sometimes  used  as  a  midpoint  response. 
In  one  study  It  had  a  scale  value  of  10.08.  which  wasivery 
close  to  the  middle  scale  value  of  10.00;  but  It  also  had  a 
fairly  large  standard  deviation  of  1.87.  Its  use  Is  not  recom¬ 
mended.  I 

q.  Oon*t  know. 

'Don't  know*  Is  an  unacceptable  term  to  use  as  a  midpoint.  It 
usually  means  to  the  subject  that,  with  additional  knowledge  or 
more  time  to  think  about  the  Issue,  he/she  could  choose  one  of 
the  other  alternatives. 

r.  Doesn't  make  any  difference. 


'Doesn't  make  any  difference'  should  not  be  used  as  a  midpoint 
response  alternative  because  It  Implies  a  more  negative  value 
than  a  middle  value.  In  one  study  It  had  a  scale  value  of 
2.83.  where  the  middle  scale  value  was  5.00.  It  also  had  an 
Interquartile  range  of  3.13.  which  means  that  there  was  a  lot 
of  disagreement  among  subjects  as  to  Its  meaning. 


127 


8  Mar  85 
(5.  1  Jul  76) 

8.  Selecting  the  Don't  Know  Response  Alternative 

Sometimes  respondents  are  known  for  their  tendency  to  mark  the 
"Don't  know"  category.  This  selection  is  made  when  they  are  not 
aware  of  the  content  in  a  question  or  when  they  refuse  to  express 
their  opinion.  Researchers  are  not  able  to  predict  who  would  make 
a  shift  into  a  "Don't  know"  category  when  it  is  offered.  It  has 
been  a  common  practice  in  public  opinion  survey  research  to  leave 
out  the  "Don't  know"  category.  Individuals  who  volunteer  a  "Don't 
know"  response  (even  though  it  is  not  offered)  will  have  it  in¬ 
cluded  as  their  selection  of  a  response  alternative.  Human  factors 
researchers  who  target  surveys  toward  respondents  who  may  not  have 
access  to  specific  experiences  or  equipment  would  appropriately 
include  the  "Don't  know"  category.  When  included  in  a  survey*  the 
"Don't  know"  category  should  be  set  apart  from  the  other  response 
alternatives  to  avoid  confusing  it  with  other  categories.  An 
example  of  the  "Don't  know"  response  alternatives  is  presented  in 
Figure  YIII-B-1. 


Figure  VIII-B-1 

Inclusion  of. the  "Don't  Know"  Response  Alternative 
f*)r  a  Maintenance  Vehicle  Questionnaire 


Ea$  i  of  Use  Rating  Scale 


5  4  3  2  1  DK 

Very  Very 

Easy  Easy  Borderline  Difficult  Difficult  Don't  Know 


How  easily  can  you: 

1. 

"Gain  access  to  the  vehicle's 
batteries? 

5 

4 

3 

2 

1 

BK 

2. 

Check  battery  and  fluid 
levels? 

5 

4 

3 

2 

1 

DK 

3. 

Check  tightness  of  battery 

• 

cables?" 

5 

4 

3 

2 

1 

DK 

9.  Selecting  Positive  and  Negative  Descriptors 


If  a  balanced  scale  is  desired,  it  is  necessary  to  select  an  equal 
number  of  positive  and  negative  descriptors.  In  most  cases,  it  is 
easy  to  determine  if  a  descriptor  is  positive  or  negative  by  seeing 
on  which  side  of  the  (zero)  midpoint  its  scale  value  falls.  For 
example,  "Mildly  like"  has  a  positive  scale  value,  and  "Mildly 
dislike"  has  a  negative  scale  value. 

123 


VIII'B  Page  12 
8  Mar  85 
(s.  1  Jul  76) 

Researchers  at  the  Am^  Research  Institute.  Fort  Hood,  recomiend 
avoiding  the  use  of  unbalanced  directionality  or  Intensity  of 
attitude  In  the  stem  of  a  question.  They  usually  work  with  rating 
scales  similar  to  the  semantic  differential,  which  simplifies  the 
composition  of  the  stem.  These  researchers  do  not  request  a  rating 
for  how  effective  a  system  Is.  but  Instead  they  ask  for  a  rating  of 
how  "effective- Ineffective"  the  system  Is.  Alternatively,  they 
delete  the  dimension  out  of  the  stem  altogether,  and  show  the 
respondent  the  dimension  only  In  the  list  of  response  alternatives. 
This  approach  Is  thought  to  create  a  formal  balance  In  the  response 
alternatives.  Using  these  techniques,  the  stems  either  have  a  for¬ 
mal  balance  or  avoid  specifying  the  dimensionality  of  the  rating. 

The  presentation  of  a  positive  or  negative  endpoint  displayed  first 
at  the  left-hand  side  of  the  scale  has  been  Investigated.  It  was 
found  that  order  of  presentation  for  the  placement  of  positive  or 
negative  endpoints  was  not  associated  with  response  style  (non¬ 
trait  measures).  Measures  of  personality  traits  are  most  Influ¬ 
enced  by  balancing  positive  and  negative  descriptors  or  stems. 
Operational  test  and  evaluation  survey  constructors  would  not  need 
to  be  concerned  about  positioning  the  positive  or  negative  endpoint 
first. 

10.  Selecting  Type  of  Response  Alternative 

Points  along  the  continuum  of  a  scale  have  been  anchored  by  many 
different  types  of  response  alternatives.  For  example,  there  have 
been  response  alternatives  such  as  numbers,  adjectives,  adverbs, 
phrases,  sentences,  descriptions  of  behavior,  etc.  It  does  not 
seem  to  matter  If  response  alternatives  are  numerical,  verbal  (one 
word),  phrases,  or  behavioral.  No  one  type  of  response  alternative 
has  proven  superior  to  another. 

11.  Selecting  Terms  Showing  Equal  Intervals 

Some  experts  argue  that,  in  order  to  perform  analyses  on  the  basis 
of  numerical  values  or  weights,  the  Intervals  between  rating  scale 
response  alternatives  should  be  equal.  This  would  be  desirable, 
but  In  many  cases  It  Is  Impossible  because  many  words  have  not  been 
assigned  scale  values.  But  when  scale  values  are  available,  the 
response  alternatives  can  be  selected  as  equally  distant  apart  as 
possible  when  doing  so  Is  considered  Important. 

There  Is  a  tendency  for  some  questionnaire  constructors  to  select 
phrases  with  parallel  wording  to  Indicate  equal  intervals.  (They 
may  also  do  so  for  other  reasons.)  However.  If  equal  Intervals  are 
considered  Important,  phrases  should  be  selected  based  upon  scale 
values  If  available.  For  example.  In  Table  VIII-E-9.  "Highly  ade¬ 
quate*  has  a  scale  value  of  3,843  while  the  parallel  term  "Highly 
Inadequate"  has  a  scale  value  of  -4.196.  This  places  "Highly  Inad¬ 
equate*  further  away  from  the  neutral  point  than  "Highly  adequate." 


129 


«  VIII-B  Page  13 

8  Mar  85 
(s.  1  4ul  76) 

12.  Use  of  Unsealed  Terms 

I  Sooe  discussion  Is  In  order  regarding  the  use  of  terms  Ignoring 

their  scale  values  or  to  which  no  scale  values  have  been  assigned. 
An  Illustration  of  the  first  of  these  practices  Is  from  a  study  In 
which  ARI  had  21  Army  officers  Involved  In  operational  field  test¬ 
ing  rant-order  16  terms  that  Included  "Outstanding,*  "Superior," 
"Excellent*  and  "Very  good."  "Excellent"  was  ranted  as  less  posi¬ 
tive  than  "Outstanding"  by  14  of  the  officers,  while  It  was  ranted 
as  less  positive  than  "Superior*  by  17  of  the  officers.  However, 
there  was  maximum  disagreement  as  to  whether  "Outstanding”  or 
"Superior"  was  first  or  second  on  the  scale.  That  Is,  12  rated 
"Superior"  first  and  "Outstanding”  second,  while  nine  of  the  offi¬ 
cers  assigned  the  reverse  ordering  to  these  two  words.  All  offi¬ 
cers  ranted  "Outstanding,"  "Superior,"  and  "Excellent*  as  more 
positive  than  "Very  good."  "Outstanding"  Is  sometimes  Interpreted 
to  denote  only  that  the  performance  Is  among  the  best  of  a  group  — 
without  any  Implication  as  to  quality,  e.g.,  although  a  student's 
grade  of  65  out  of  100  points  was  falling,  his/her  performance  may 
have  been  "Outstanding*  since  no  other  student  In  the  class  scored 
above  601 

What  are  the  consequences  to  the  developer  of  rating  scale  Items  of 
discovering  a  mean  S0S-50t  split  as  In  the  ordering  of  "Outstand¬ 
ing*  and  "Superior?"  ,0oes  It  mean  they  cannot  be  used  together  as 
part  of  the  descriptors  of  a  rating  scale  iiem?  The  answer  Is, 
"Normally  yes."  In  Figure  VIII-B-2,  we  would  have  better  discri¬ 
mination  If  "Outstanding*  were  replaced  by  "Excellent,"  with  the 
position  formerly  occupied  by  "Excellent"  being  filled  by  "Very 
good."  "Superior"  and  "Outstanding*  or  similarly  overlapping  terms 
should  normally  not  be  used  on  the  same  scale. 


1. 


2. 


Figure  VIII-B-2 

Two  Formats  Using  "Outstanding"  and  "Superior" 

1.  Superior 

2.  Outstanding 
_____  3.  Excellent 
___  4.  Good 

_ 5.  Fair 

_____  6.  Poor 

Superior  Outstanding  Excellent  Good  Fair  Poor 


(Circle  one  word) 


130 


8  Mar  85 
(s.  1  Ju1  76] 


When  functioning  as  questionnaire  consultants  or  developers  In 
field  test  situations  where  respondents  are  enlisted  personnel,  ARI 
has  recotnmended  and  used  very  little  variety  in  its  rating  scale 
items.  Arrays  such  as  those  shown  in  Figure  VIll-B-S  are  almost 
always  proposed  and  used.  Sometimes  the  middle  term  is  deleted. 
Several  reasons  for  the  lack  of  variety  are  that  a  standard  format 
facilitates:  (1)  comparability  of  rating  distributions  with  previ¬ 
ous  tests,  and  (2)  understanding  by  soldier  respondents,  who  are 
often  not  high  school  graduates. 


Figure  VIII-B-3 

Response  Alternatives 

Frequently  Recommended  by  ARI 

( ) 

Very  satisfactory 

( ) 

Satisfactory 

( ) 

Borderline 

( ) 

Unsatisfactory 

{ ) 

Very  unsatisfactory 

( ) 

Very  effective 

( ) 

Effective 

( ) 

Borderline 

( ) 

Ineffective 

( ) 

Very  ineffective 

( ) 

Very  acceptable 

( ) 

Acceptable 

( ) 

Borderline 

( ) 

Unacceptable 

(  ) 

Very  unacceptable 

131 


VIII-C  Page  1 
8  Mar  85 
(s.  1  Jul  76) 

C.  Selection  of  Response  Alternatives  Denoting  Degrees  of  Frequency 

Some  questionnaire  designers  use  verbal  descriptors  to  denote  degrees 
of  frequency.  Table  VIII-C-1  shows  such  a  list  of  verbal  descriptors. 

A  study  showed  that  there  was  a  great  deal  of  variability  In  meaning 
for  frequency  phrases.  Questionnaires  should,  whenever  possible,  use 
response  alternatives  that  Include  a  number  designation  or  percentage 
of  time  meant  by  each  word  used  as  a  response  alternative. 


Table  VIlI-C-l 
Degrees  of  Frequency 


Phrase 

Scale 

Value 

Inter- 

Quartlle 

Range 

Always 

8.99 

.52 

Without  fall 

8.89 

.61 

Often 

7.23 

1.02 

Usually 

7.17 

1.36 

Frequently 

6.92 

.77 

Mow  and  then 

4.79 

1.40 

Sometimes 

4.78 

1.83 

Occasionally 

4.13 

2.06 

Seldom 

2.45 

1.05 

Rarely 

2.08 

.61 

Never 

1.00 

.50 

From:  Dodd  and  6erb< 
See  Section  YIII-A  7. 

. 

irick  (1960) 

. 

i 


•  I 

I 

132  ! 


VIII-0  Pige  1 
8  Mar  85 
(s.  1  Jul  76) 

D.  flection  of  Response  Alternatives  Using  Order  of  Merit  Lists  of 
Descriptor  Terms 

An  order  of  writ  list  of  descriptors  docs  not  provide  scale  values  nor 
show  the  variance  of  each  phrase  along  some  continuum.  In  addition, 
the  list  does  not  represent  an  equal  Interval  scale.  However,  such 
lists  are  still  useful  for  selecting  response  alternatives  If  the  main 
concern  Is  to  select  response  categories  so  that  each  respondent  will 
agree  on  the  relative  degree  of  ‘goodness*  of  the  terns.  Tables  VIII* 
0*1  and  ¥111*0*2  give  examples  of  order  of  merit  lists  of  descriptor 
terns. 


Tabic  ¥111*0*1 

Order  of  Merit  of  Selected  Oescriptive  Terms 


Order  of  Merit 

Oescriptive  Term 

1 

¥ery  superior 

2 

¥ery  outstanding 

3 

Superior 

4 

Outstanding 

5 

Excellent 

6 

¥ery  good 

7 

Good 

8 

¥cry  satisfactory 

9 

Satisfactory 

10 

Marginal 

11 

Oorderllne 

12 

Poor 

13 

Unsatl sfactory 

14 

Bad 

15 

¥ery  poor 

16 

¥cry  unsatisfactory 

17 

¥ery  bad 

18 

Extremely  poor 

19 

Extremely  unsatisfactory 

20 

Extremely  bad 

From:  Gividen  (1973).  Section  ¥III*A  9. 


133 


YIII'O  Page  2 
8  Mar  8S 
(s.  1  Jul  76) 

Table  ¥111-0-2 

Order  of  Merit  of  Descriptive  Terms 
Using  "Use”  as  a  Descriptor 


Order  of  Merit 

Descriptive  Term 

1 

Extremely  useful 

2 

Very  useful 

3 

Of  significant  use 

4 

Of  considerable  use 

5 

Of  flMCh  use 

6 

Of  moderate  use 

7 

Of  use 

8 

Of  some  use 

9 

Of  little  use 

10 

Not  very  useful 

11 

Of  slight  use 

12 

Of  very  little  use 

13 

Of  no  use 

From:  Gividen  (1973).  See  Section  VIII-A  9. 


134 


/ 


YIII-E  Page  1 
1  Jul  76 

E.  Selection  of  Response  Alternatives  Using  Scale  Values  and  Standard 
Deviations 

Using  scale  values  and  standard  deviations  to  select  response  alterna¬ 
tives  will  give  a  .sore  refined  set  of  phrases  than  using  an  order  of 
■erlt  list.  Other  sections  above  have  discussed  specific  considera¬ 
tions  In  selecting  descriptors.  In  general,  response  alternatives 
selected  from  lists  of  phrases  with  scale  values  should  usually  have 
the  following  characteristics: 

1.  The  scale  values  of  the  terms  should  be  as  far  apart  as  possible. 

2.  The  scale  values  of  the  terms  should  be  as  equally  distant  as 
possible. 

3.  The  terms  should  have  small  variability  (small  standard  deviations 
or  Interquartile  ranges). 

4.  Other  things  being  equal,  the  terms  should  have  parallel  wording. 

Tables  YIII-E-1  through  YIII-E-24  give  lists  of  phrases  which  have 
scale  values  and,  when  possible,  standard  deviations  or  Interquartile 
range.  They  are  based  on  empirical  evidence,  and  may  be  used  to  select 
response  alternatives. 


135 


VIII'E  Page  2 
8  Mar  85 
(s.  1  Jul  76) 


Table  VlII-E-l 
Acceptability  Phrases 


Phrase 

Average 

SO 

Excellent 

6.27 

.54 

Perfect  In  every  respect 

6.22 

.86 

Extremely  good 

5.74 

.81 

Very  good 

5.19 

.75 

Unusually  good 

5.03 

.98 

Very  good  In  most  respects 

4.62 

.72 

Good 

4.25 

.90 

Moderately  good 

3.58 

.77 

Could  use  some  minor  changes 

3.28 

1.09 

Mot  good  enough  for  extreme  conditions 

3.10 

1.30 

Not  good  for  rough  use 

2.72 

1.15 

Not  very  good 

2.10 

.85 

Needs  major  changes 

1.97 

1.12 

Barely  acceptable 

1.79 

.90 

Not  good  enough  for  general  use 

1.76 

1.21 

Better  than  nothing 

1.22 

1.08 

Poor 

1.06 

1.11 

Very  poor 

.76 

.95 

Extremely  poor 

.36 

.76 

From:  U.S.  Army  (1973).  See  Section  YIII-A  21. 


136 


VIII-E  Page  3 
8  Mar  85 
(s.  1  Jul  76) 


Table  Vin-E-2 

Degrees  of  Excellence:  First  Set 


Phrase 

Scale 

Value 

SD 

Superior 

20.12 

1.17 

Fantastic 

20.12 

0.83 

Tremendous 

19.84 

1.31 

Superb 

19.80 

1.19 

Excellent 

19.40 

1.73 

Terrific 

19.00 

2.45 

Outstanding 

18.96 

1.99 

Wonderful 

17.32 

2.30 

Delightful 

16.92 

1.85 

Fine 

14.80 

2.12 

Good 

14.32 

2.08 

Pleasant 

13.44 

2.06 

Nice 

12.56 

2.14 

Acceptable 

11.12 

2.59 

Average 

10.84 

1.55 

All  right 

10.76 

1.42 

O.X. 

10.28 

1.67 

Neutral 

9.80 

1.50 

Fair 

9.52 

2.06 

Mediocre 

9.44 

1.80 

Unpleasant 

$.04 

2.82 

Bad 

3.88 

2.19 

Very  bad 

3.20 

2.10 

Unacceptable 

2.64 

2.04 

Awful 

1.92 

1.50 

Terrible 

1.76 

.77 

Horrible 

1.48 

.87 

From:  Myers  and  Warner  (1968). 
See  Section  VlII-A  17. 


137 


Vin-E  Page  4 
8  Mar  85 
(s.  1  Jul  76) 


Table  YIII-E-3 

Degrees  of  Excellence:  Second  Set 


Phrase 

Scale 

Value 

SD 

Best  of  all 

6.1S 

2.48 

Excellent 

3.71 

1.01 

Wonderful 

3.51 

.97 

Mighty  fine 

2.88 

.67 

Especially  good 

2.86 

.82 

Very  good 

2.56 

.87 

Good 

1.91 

.76 

Pleasing 

1.58 

.65 

O.X. 

.87 

1.24 

Fair 

.78 

.85 

Only  fair 

.71 

.64 

Not  pleasing 

-.83 

.67 

Poor 

-1.55 

.87 

Bad 

-2.02 

.80 

Very  bad 

-2.53 

.64 

Terrible 

-3.09 

.98 

Fron:  Jones  and  Thurstone  (1955). 
See  Section  VIII-A  12. 


138 


VIII-E  Page  5 
8  Mar  85 
(s.  1  Jul  76) 


Table  VIII-E-4 
Degrees  of  Like  and  Dislike 


Phrase 

Scale 

Value 

SO 

Like  cxtrenely 

4.16 

1.62 

Like  Intensely 

4.05 

1.59 

Strongly  like 

2.96 

.69 

Like  very  much 

2.91 

.60 

Like  very  well 

2.60 

.78 

Like  quite  a  bit 

2.32 

.52 

Like  fairly  well 

1.51 

.59 

Like 

1.35 

.77 

Like  aoderately 

1.12 

.61 

Mildly  like 

.85 

.47 

Like  slightly 

.69 

.32 

Neutral 

.02 

.18 

Like  not  so  well 

-.30 

1.07 

Like  not  so  auch 

-.41 

.94 

Dislike  slightly 

-.59 

.27 

Mildly  dislike 

-.74 

.35 

Dislike  Moderately 

-1.20 

.41 

Dislike 

Don't  like 

-1.58 

.94 

-1.81 

.97 

Strongly  dislike 

-2.37 

.S3 

Dislike  very  auch 

-2.49 

.64 

Dislike  Intensely 

-3.33 

1.39 

Dislike  extreaely 

-4.32 

1.86 

Fran:  Jones  and  Thurstone  (1955). 
See  Section  VIIl-A  12. 


139 


VIII-E  Page  6 
8  Mar  85 
(s.  1  Jul  76) 


Table  VIII-E-5 
Degrees  of  Good  and  Poor 


Phrase 

Scale 

Value 

SO 

Exceptionally  good 

18.56 

2.36 

Extreeely  good 

18.44 

1.61 

Unusually  good 

17.08 

2.43 

Reurkably  good 

16.68 

2.19 

Very  good 

15.44 

2.77 

Qui te  good 

14.44 

2.76 

Good 

14.32 

2.08 

Moderately  good 

13.44 

2.23 

Reasonably  good 

12.92 

2.93 

.  Fairly  good 

11.96 

2.42 

Slightly  good 

11.84 

2.19 

So-so 

10.08 

1.87 

Not  very  good 

6.72 

2.82 

Moderately  poor 

6.44 

1.64 

Reasonably  poor 

6.32 

2.46 

Slightly  poor 

5.92 

1.96 

Poor 

5.72 

2.09 

Fairly  poor 

5.64 

1.68 

Quite  poor 

4.80 

1.44 

Unusually  poor 

3.20 

1.44 

Very  poor 

3.12 

1.17 

Remarkably  poor 

2.88 

1.74 

Exceptionally  poor 

2.52 

1.19 

Extremely  poor 

2.08 

1.19 

From:  Myers  and  Warner  (1968). 
See  Section  VIIl-A  17. 


140 


VII I-E  Page  7 
8  Mar  85 
(s.  1  Jul  76) 


Table  VIII-E-6 
Degrees  of  Good  and  Bad 


Phrase 

\ 

Scale 

Value 

Extreaely  good 

3.449 

Very  good 

3.250 

Unusually  good 

3.243 

Decidedly  good 

3.024 

Quite  good 

2.880 

Rather  good 

2.755 

Good 

2.712 

Pretty  good 

2.622 

Sonewhat  good 

2.462 

Slightly  good 

2.417 

Slightly  bad 

1.497 

Somewhat  bad 

1.323 

Rather  bad 

1.232 

Bad 

1.024 

Pretty  bad 

1.018 

Quite  bad 

.924 

Decidedly  bad 

.797 

Unusually  bad 

.662 

Very  bad 

.639 

Extremely  bad 

.470 

Froo:  Cliff  (1959). 
See  Section  VIII-A  6. 


141 


VIII-E  Page  8 
1  Jul  76 


Table  VIII-E-7 

Degrees  of  Agree  and  Disagree 


Phrase 

Mean 

SO 

Decidedly  agree 

2.77 

.41 

Quite  agree 

2.37 

.49 

Considerably  agree 

2.21 

.42 

Substantially  agree 

2.10 

.50 

Moderately  agree 

1.47 

.41 

Somewhat  agree 

.94 

.41 

Slightly  agree 

.67 

.36 

Perhaps  agree 

.52 

.46 

Perhaps  disagree 

-.43 

.46 

Slightly  disagree 

-.64 

.38 

Somewhat  disagree 

-.93 

.47 

Moderately  disagree 

-1.35 

.42 

Quite  disagree 

-2.16 

.57 

Substantially  disagree 

-2.17 

.51 

Considerably  disagree 

-2.17 

.45 

Decidedly  disagree 

-2.76 

.43 

From;  Altemeyer  (1970). 
See  Section  VII I-A  1. 


142 


VIII-E  Page  9 
8  Mar  85 
(s.  1  Jul  76) 


Table  VIII-E-S 
Degrees  of  More  and  Less 


Phrase 

Scale 

Value 

Inter¬ 

quartile 

Range 

Very  much  more 

8.02 

.61 

Much  more 

7.67 

1.04 

A  lot  more 

7.S0 

1.06 

A  good  deal  more 

7.29 

.98 

More 

6.33 

1.01 

Somewhat  more 

6.25 

.98 

A  little  more 

6.00 

.58 

Slightly  more 

5.99 

.57 

Slightly  less 

3.97 

.56 

A  little  less 

3.96 

.54 

Less 

3.64 

1.04 

Much  less 

2.55 

1.06 

A  good  deal  less 

2.44 

1.11 

A  lot  less 

2.36 

1.03 

Very  much  less 

1.96 

.52 

From:  Dodd  and  Gerberick  (1960). 
See  Section  VIII-A  7. 


f  YIII-E  Page  10 
^  I  Jul  76 


Table  Vin-E-9 

Degrees  of  Adequate  and  Inadequate 


Phrase 

Mean 

SO 

Totally  adequate 

4.620 

.846 

Absolutely  adequate 

4.540 

.921 

Completely  adequate 

4.490 

.825 

Extremely  adequate 

4.412 

.719 

Exceptionally  adequate 

4.380 

.869 

Entirely  adequate 

4.340 

.863 

Wholly  adequate 

4.314 

1.038 

Fully  adequate 

4.294 

.914 

Very  very  adequate 

4.063 

.876 

Perfectly  adequate 

3.922 

1.026 

Highly  adequate 

3.843 

.606 

Most  adequate 

3.843 

.978 

Very  adequate 

3.420 

.851 

Decidedly  adequate 

3.140 

1.536 

Considerably  adequate 

3.020 

.874 

Quite  adequate 

2.980 

.979 

Largely  adequate 

2.863 

.991 

Substantially  adequate 

2.608 

1.030 

Reasonably  adequate 

2.412 

.771 

Pretty  adequate 

2.306 

.862 

Rather  adequate 

1.755 

.893 

Mildly  adequate 

1.571 

.670 

Somewhat  adequate 

1.327 

.793 

Slightly  adequate 

1.200 

.566 

Barely  adequate 

.627 

.928 

Neutral 

.000 

.000 

Borderline 

-.020 

.316 

Barely  Inadequate 

-1.157 

.638 

Mildly  Inadequate 

-1.353 

.621 

Slightly  Inadequate 

-1.380 

.772 

Somewhat  Inadequate 

-1.882 

.732 

Rather  Inadequate 

-2.102 

.974 

Moderately  Inadequate 

-2.157 

1.017 

Fairly  Inadequate 

-2.216 

.800 

Pretty  Inadequate 

-2.347 

.959 

Considerably  Inadequate 

-3.500 

.680 

Very  Inadequate 

-3.735 

.777 

Decidedly  Inadequate 

-3.780 

.944 

Most  Inadequate 

-3.980 

1.545 

Highly  Inadequate 

-4.196 

.741 

(Table  continued  on  next  page) 


144 


VIII-E  Page  11 
8  Mar  85 
(s.  1  Jul  76) 


Table  VII I-E-9  (Cent.) 
Degrees  of  Adequate  and  Inadequate 


Phrase 

Mean 

SO 

Very  very  Inadequate 

•4.460 

.537 

Extrcnely  Inadequate 

-4.608 

.527 

Fully  Inadequate 

-4.667 

.676 

Exceptionally  Inadequate 

-4.680 

.508 

Wholly  Inadequate 

-4.784 

.498 

Entirely  Inadequate 

-4.792 

.644 

Completely  Inadequate 

-4.800 

.529 

Absolutely  Inadequate 

-4.880 

.431 

Totally  Inadequate 

-4.900 

.412 

Fran:  Matthews,  Wright,  and  Yudowitch  (1975). 
See  Section  Vlll-A  13. 


Table  VIII-E-10 

Degrees  of  Acceptable  and  Unacceptable 


Phrase 

Mean 

Wholly  acceptable 

4.725 

.563 

Completely  acceptable 

4.686 

.610 

Fully  acceptable 

4.412 

.867 

Extremely  acceptable 

4.392 

.716 

Most  acceptable 

4.157 

.915 

Very  very  acceptable 

4.157 

.825 

Highly  acceptable 

4.040 

.631 

Oulte  acceptable 

3.216 

.956 

Largely  acceptable 

3.137 

.991 

Acceptable 

2.392 

1.456 

Reasonably  acceptable 

2.294 

.722 

Moderately  acceptable 

2.280 

.722. 

Pret^  acceptable 

2.000 

1.125 

(Table  continued  on  next  page) 


145 


niK  Page  12 
8  Mar  85 
(S.  1  Jul  76) 


Table  YHI-E-IO  (Cont.) 

Degrees  of  Acceptable  and  Unacceptable 


Phrase  Mean  SO 


Rather  acceptable 

1.939 

.818 

Fairly  acceptable 

1.840 

.924 

Mildly  acceptable 

1.686 

.700 

Somewhat  acceptable 

1.458 

1.241 

8arely  acceptable 

1.078 

.518 

Slightly  acceptable 

1.039 

.522 

Sort  of  acceptable 

.940 

.645 

Borderline 

.000 

.200 

Neutral 

.000 

.000 

Marginal 

-.120 

.515 

Barely  unacceptable 

-1.100 

.300 

Slightly  unacceptable 

-1.255 

.589 

Somewhat  unacceptable 

-1.765 

.674 

Rather  unacceptable 

-2.020 

.836 

-  Fairly  unacceptable 

-2.160 

.880 

Moderately  unacceptable 

-2.340 

.681 

Pretty  unacceptable 

•2.412 

.662 

Reasonably  unacceptable 

-2.440 

.753 

Unacceptable 

-2.667 

1.381 

Substantially  unacceptable 

-3.235 

.899 

Quite  unacceptable 

-3.388 

1.066 

Largely  unacceptable 

-3.392 

.818 

Considerably  unacceptable 

-3.440 

.779 

Notably  unacceptable 

-3.500 

1.044 

Decidedly  unacceptable 

-3.837 

1.017 

Highly  unacceptable 

-4.294 

.535 

Most  unacceptable 

-4.420 

.724 

Yery  very  unacceptable 

-4.490 

.500 

Exceptionally  unacceptable 

-4.540 

.607 

Extremely  unacceptable 

-4.686 

.464 

Completely  unacceptable 

-4.900 

.361 

Entirely  unacceptable 

-4.900 

.361 

Wholly  unacceptable 

-4.922 

.269 

Absolutely  unacceptable 

-4.922 

.334 

Totally  unacceptable 

-4.941 

.235 

From:  Matthews,  Mright,  and  Yudowltch  (1975). 
See  Section  YIII-A  13. 


146 


T*bl«  VIII-E-11 
Compart son  Phrases 


VIII-E  Page  13 
8  Mar  85 
(s.  1  Jul  76) 


Phrase 

Mean 

SO 

Best  of  all 

4.S96 

.510 

Absolutely  best 

4.S43 

.459 

Truly  best 

4.600 

.721 

Undoubtedly  best 

4.569 

.823 

Decidedly  best 

4.373 

.839 

Best 

4.216 

1.459 

Absolutely  better 

4.060 

.988 

Extremely  better 

3.922 

.882 

Substantially  best 

3.700 

.922 

Decidedly  better 

3.412 

.933 

Conspicuously  better 

3.059 

.802 

Moderately  better 

2.255 

.737 

Somewhat  better 

1.643 

.801 

Rather  better 

1.B16 

.719 

Slightly  better 

1.157 

.776 

Barely  better 

.961 

.656 

Absolutely  alike 

.588 

1.623 

Alike 

.216 

.847 

The  same 

.157 

.801 

Neutral 

.090 

.000 

Borderline 

-.061 

.314 

Marginal 

-.184 

.919 

Barely  worse 

-1.039 

.816 

Slightly  worse 

-1.216 

.498 

Somewhat  worse 

-2.078 

.860 

Moderately  worse 

-2.220 

.944 

Noticeably  worse 

-2.529 

1.036 

Horse 

-2.667 

1.423 

Notably  worse 

-3.020 

1.038 

Largely  worse 

-3.216 

1.108 

Considerably  worse 

-3.275 

1.206 

Conspicuously  worse 

-3.275 

.887 

Much  worse 

-3.286 

.808 

Substantially  worse 

-3.460 

.899 

Decidedly  worse 

-3.760 

.907 

Very  much  worse 

-3.941 

.752 

Absolutely  worse 

-4.431 

.823 

Decidedly  worst 

-4.431 

.748 

Undoubtedly  worst 

-4.510 

.872 

Absolutely  worst 

-4.686 

1.291 

Horst  of  all 

-4.776 

1.298 

From:  Matthews,  Wright,  and  Yudowttch  (1975). 
Set  Section  VIII-A  13. 


147 


YIII-E  Page  14 
8  Mar  85 
(s.  1  Jul  76) 


Table  YIII-E-12 

Degrees  of  Satisfactory  and  Unsatisfactory 


Phrase 

Scale 

Yalue 

SO 

Quite  satisfactory 

4.35 

.95 

Satisfactory 

3.69 

.87 

Not  very  satisfactory 

2.11 

.76 

Unsatisfactory  but  usable 

2.00 

.87 

Yery  unsatisfactory 

.69 

1.32 

From:  U.S.  Amy  (1973).  See  Section  YIII*A  21. 


Table  Yni-E-13 
Degrees  of  Unsatisfactory 


Phrase 

Scale 

Yalue 

Unsatisfactory 

1.47 

Quite  unsatisfactory 

1.00 

Yery  unsatisfactory 

.75 

Unusually  unsatisfactory 

.75 

Highly  unsatisfactory 

.71 

Yery,  very  unsatisfactory 

.25 

Extremely  unsatisfactory 

.10 

Completely  unsatisfactory 

.00 

From:  Hosier  (1941). 
See  Section  YlII-A  16. 


148 


YIII>E  Page  15 
8  Mar  85 
(s.  1  Jul  76) 


Table  Vin-£-14 
Degrees  of  Pleasant 


Phrase 

Scale 

Value 

Extremely  pleasant 

3.490 

Very  pleasant 

3.174 

Unusually  pleasant 

3.107 

Decidedly  pleasant 

3.028 

Quite  pleasant 

2.849 

Pleasant 

2.770 

Rather  pleasant 

2.743 

Pretty  pleasant 

2.738 

Sonewhat  pleasant 

2.505 

Slightly  pleasant 

2.440 

From:  Cliff  (1959). 

See  Section  VIII-A  6. 

Table  VIII-E-IS 

Oegrees  of  Agreeable 

Phrase 

Scale 

Value 

Very,  very  agreeable 

5.34 

Extremely  agreeable. 

5.10 

Highly  agreeable 

5.02 

Completely  agreeable 

4.96 

Unusually  agreeable 

4.86 

Very  agreeable 

4.82 

Quite  agreeable 

4.45 

Agreeable 

4.19 

Fron:  Nosier  (1941). 
See  Section  VIII-A  16. 


14S 


vni-E  Page  16 
8  Mar  85 
(s.  1  Jul  76) 


Table  YIII-E-W 
Degrees  of  Desirable 


Phrase 

\ 

Scale 

Value 

Very,  vei^  desirable 

5.66 

Extremely  desirable 

5.42 

Completely  desirable 

5.38 

Unusually  desirable 

5.23 

Highly  desirable 

5.15 

Very  desirable 

4.96 

Quite  desirable 

4.76 

Desirable 

4.50 

From:  Hosier  (1941). 

See  Section  VIII-A  16. 

Table  VIII-E-17 

Degrees  of  Mice 

Phrase 

Scale 

Value 

Extremely  nice 

3.351 

Unusually  nice 

3.155 

''ery  nice 

3.016 

uecldedly  nice 

2.969 

Pretty  nice 

2.767 

Quite  nice 

2.738 

Nice 

2.636 

Rather  nice 

2.568 

Somewhat  nice 

2.488 

Slightly  nice 

2.286 

From:  Cliff  (1959). 
See  Section  VII1>A  6. 


150 


YIll-E  Page  17 
8  Nar  8S 
(S.  1  Jul  76) 


Table  YIII-E-18 
Degrees  of  Adequate 


Phrase 

Scale 

Yalue 

SO 

More  than  adequate 

4.13 

1.11 

Adequate 

3.39 

.87 

Not  quite  adequate 

2.40 

.8$ 

Barely  adequate 

2.10 

.84 

Not  adequate 

1.83 

.98 

From:  U.S.  Army  (1973). 

See  Section  YIII-A  21. 

Table  YIII-E-W 

Degrees  of  Ordinary 

Phrase 

Scale 

Yalue 

Ordinary 

2.074 

Yery  ordinary 

2.073 

Somewhat  ordinary 

2.038 

Rather  ordinary 

2.034 

Pret^  ordinary 

2.026 

Slightly  ordinary 

1.980 

Decidedly  ordinary 

1.949 

Extremely  ordinary 

1.936 

Unusually  ordinary 

1.87S 

Fro«:  Cliff  (1959). 
See  Section  YIII-A  6. 


151 


YIII-E  Page  18 
8  Mar  85 
(s.  1  Jul  76) 


Table  VIII-E-20 
Degrees  of  Average 


Phrase 

Scale 

Value 

Rather  average 

2.172 

Average 

2.145 

Quite  average 

2.101 

Pretty  average 

2.094 

Sonewhat  average 

2.080 

Unusually  average 

2.062 

Extremely  average 

2.052 

Very  average 

2.039 

Slightly  average 

2.023 

Decidedly  average 

2.020 

Fro*:  Cliff  (1959). 
Set  Section  VlII-A  6. 


Table  vni-E-21 
Degrees  of  Hesitation 


Phrase 

Scale 

Value 

Inter¬ 

quartile 

Range 

Without  hesitation 

7.50 

6.54 

With  little  hesitation 

5.83 

3.40 

Hesitant 

4.77 

1.06 

With  some  hesitation 

4.38 

1.60 

With  considerable  hesitation 

3.29 

3.39 

With  much  hesitatloi 

3.20 

5.25 

With  great  hesltatlc 

2.41 

6.00 

Fron:  Dodd  and  Gerberick  (1960).  See  Section  VIII-A  7. 


152 


Vni-E  Page  19 
8  Mar  85 
(s.  1  Jul  76) 


Table  YIIl-E-22 
Degrees  of  Inferior 


Phrase 

Scale 

Value 

Slightly  inferior 
Soaewhat  inferior 
Inferior 

Rather  Inferior 

Pretty  inferior 

Quite  inferior 

Decidedly  inferior 
Unusually  inferior 

Very  inferior 

Extreoely  inferior 

1.S20 

1.516 

1.323 

1.295 

1.180 

1.127 

1.013 

.963 

.927 

.705 

Froa:  Cliff  (19S9). 

See  Section  VIII-A  6. 

• 

Table  VIIl-E-23 

Degrees  of  Poor 

Phrase 

Scale 

Value 

Poor 

Quite  poor 

Very  poor 

Unusually  poor 
Extremely  poor 
Completely  poor 

Very,  very  poor 

1.60 

1.30 

1.18 

.95 

.95 

.92 

.55 

Froa:  Hosier  (1941). 

See  Section  VIII-A  16. 


I 


153 


YIII-E  Page  20 
1  Jul  76 


Table  YIII-E-24 
Descriptive  Phrases 


Phrase 

Scale 

Yalue 

Inter- 
.  quartlle 
Range 

Conplete 

8.85 

.65 

Extremely  vital 

8.79 

.84 

Yery  certain 

8.SS 

1.05 

Yery  strongly 

8.40 

1.04 

Yery  crucial 

8.29 

1.12 

Yery  Important 

8.22 

1.16 

Yery  sure 

8.15 

.95 

Almost  complete 

8.06 

.58 

Of  great  Importance 

8.05 

.91 

Yery  urgent 

8.00 

.90 

Feel  strongly  toward 

7.80 

1.60 

Essential 

7.58 

1.85 

Yery  vital 

7.55 

1.05 

Certain 

7.13 

1.44 

Strongly 

7.07 

.67 

Important 

6.83 

1.14 

Good  " 

6.72 

1.20 

Urgent 

6.41 

1.53 

Crucial 

6.39 

1.73 

Sure 

5.93 

1.87 

Yital 

5.92 

1.63 

Moderately 

5.24 

.99 

NOW 

5.03 

.53 

As  at  present 

5.00 

.50 

Fair 

4.96 

.77 

Don't  know 

4.82 

.82 

Undecided 

Don't  care 

4.76 

1.06 

4.63 

2.00 

Somewhat 

3.79 

.94 

Indifferent 

3.70 

2.20 

Object  strongly  to 

3.50 

6.07 

Not  Important 

3.09 

1.33 

Unimportant 

1.94 

1.42 

Bad 

2.83 

.93 

Uncertain 

Doesn't  make  any  difference 

2.83 

2.50 

2.83 

3.13 

Not  sure 

2.82 

1.24 

Not  certain 

2.64 

2.62 

(Table  continued  on  next  page) 


\ 


154 


VIII-E  Page  21 
8  Mar  8S 
(s.  1  Jul  76) 


Table  YIII-E-24  (Cent.) 
Descriptive  Phrases 


Phrase 

Scale 

Value 

Inter¬ 

quartile 

Range 

Non-essential 

Doesn't  «ean  anything 

2.58 

1.67 

2.50 

2.71 

Insignificant 

2.12 

1.14 

Very  little 

2.D8 

.64 

Alaost  none 

2.04 

.57 

Very  unimportant 

1.75 

1.25 

Only  as  a  last  resort 

1.70 

7.30 

Very  bad 

1.50 

1.13 

None 

1.11 

.59 

From:  Dodd  and  Gerberick  (I960).. 
See  Section  VIII-A  7. 


155 


/V 


VIII-F  Page  1 
1  Jul  76 


F.  Sample  Sets  of  Response  Alternatives 

It  Is  sometimes  valuable  and  Is  a  tlme^saver  to  have  lists  of  response 
alternatives  available  to  use.  The  tables  In  this  section  give  some 
examples  of  response  alternatives  that  have  been  selected  on  different 
bases.  These  sets  do  not  exhaust  all  possibilities. 

The  sets  of  response  alternatives  that  appear  In  Table  VIII*F*1  were 
selected  so  that  the  phrases  In  each  set  would  have  means  at  least  one 
standard  deviation  away  from  each  other  and  have  parallel  wording. 

Some  of  the  sets  of  response  alternatives  have  extreme  endpoints,  some 
do  not.  The  sets  of  response  alternatives  shown  In  Table  VIII*F-2  were 
selected  so  that  the  phrases  In  each  set  would  be  as  nearly  equally 
distant  from  each  other  as  possible  without  regard  to  parallel  wording. 
Table  VIII*F*3  contains  sets  of  response  alternatives  selected  from 
lists  of  descriptors  with  only  scale  values  given.  The  phrases  were 
selected  on  the  bases  of  equal  appearing  Intervals.  Table  vni-F*4  has 
sets  of  response  alternatives  selected  from  order  of  merit  lists  of 
descriptors. 


156 


vni-F  Page  2 
1  Jul  76 


Table  VIII-F-1 

Sets  of  Response  Alternatives  Selected  So  Phrases  Are  at  Least 
One  Standard  Deviation  Apart  and  Have  Parallel  Hording 


Set 

No. 

Response  Alternatives 

Set 

No. 

Response  Alternatives 

1. 

Conpletely  acceptable 
Reasonably  acceptable 

Barely  acceptable 

Borderline 

Barely  unacceptable 
Reasonably  unacceptable 

7. 

Very  adequate 

Slightly  adequate 
Borderline 

Slightly  Inadequate 

Very  Inadequate 

Conpletely  unacceptable 

B. 

Highly  adequate 

Mildly  adequate 

2. 

Wholly  acceptable 

Largely  acceptable 
•  Borderline 

Largely  unacceptable 

Borderline 

Mildly  Inadequate 

Highly  Inadequate 

Wholly  unacceptable 

9. 

Decidedly  agree 
Substantially  agree 

3. 

Largely  acceptable 

Barely  acceptable 
Borderline 

Barely  unacceptable 

Largely  unacceptable 

10. 

Slightly  agree 

Slightly  disagree 
Substantially  disagree 
Decidedly  disagree 

Moderately  agree 

4. 

Reasonably  acceptable 
Slightly  acceptable 
Borderline 

Slightly  unacceptable 
Reasonably  unacceptable 

11. 

Perhaps  agree 

Neutral 

Perhaps  disagree  . 
Moderately  disagree 

Undoubtedly  best 

5. 

Totally  adequate 

Very  adequate 

Barely  adequate 

Borderline 

Barely  Inadequate 

Very  Inadequate 

Totally  Inadequate 

12.. 

Conspicuously  better 
Moderately  better 

Alike 

Moderately  worse 
Conspicuously  worse 
Undoubtedly  worst 

Moderately  better 

6. 

Completely  adequate 
Considerably  adequate 
Borderline 

Considerably  Inadequate 
Completely  Inadequate 

Barely  better 

The  same 

Barely  worse 

Moderately  worse 

(Table  continued  on  next  page) 


157 


. 


I 


nil-F  Page  3 
I  Jul  76 


Table  VIII-F-l  (Cont.) 


Sets  of  Response  Alternatives  Selected  So  Phrases  Are  at  Least 
One  Standard  Deviation  Apart  and  Have  Parallel  Hording 


Set 

No. 

Response  Alternatives 

Response  Alternatives 

13. 

Extrenely  good 

16.  Like  extremely 

Remarkably  good 

Like  moderately 

Good 

Neutral 

So-so 

Dislike  moderately 

Poor 

Dislike  extremely 

Remarkably  poor 

Extremely  poor 

17.  Strongly  like 

Like 

14. 

Exceptionally  good 

Neutral 

Reasonably  good 

Don't  like 

So-so 

Strongly  dislike 

Reasonably  poor 

Exceptionally  poor 

18.  Very  much  more 

A  good  deal  more 

15. 

Very  Important 

A  little  more 

Important 

A  little  less 

Not  Important 

A  good  deal  less 

Very  unimportant 

Very  much  less 

158 


T4bl«  VIII-F-2 


VIII-F  P»ge  4 
1  Jul  76 


Sets  of  Response  Alternatives  Se1ecte4  So  That 
Intervals  Between  Phrases  Are  as  Nearly  Equal  as  Possible 


Set 

No. 

Response  Alternatives 

Set 

No. 

Response  Alternatives 

1. 

Conpletely  acceptable 
Reasonably  acceptable 
Borderline 

Hoderately  unacceptable 
Extremely  unacceptable 

7. 

Perfect  In  every  respect 

Very  good 
fiood 

Could  use  some  minor  changes 

Not  very  good 

Better  than  nothing 

2. 

Totally  adequate 

Pretty  adequate 

Extremely  poor 

Borderline 

Pretty  Inadequate 
Extremely  Inadequate 

8. 

Excellent 

Good 

Only  fair 

Poor 

3. 

Highly  adequate 

Rather  adequate 

Terrible 

Borderline 

Somewhat  Inadequate 
Decidedly  Inadequate 

9. 

Extremely  good 

Quite  good 

So-so 

Slightly  poor 

4* 

Quite  agree 

Moderately  agree 

Extremely  poor 

5. 

Perhaps  agree 

Perhaps  disagree 
Moderately  disagree 
Substantially  disagree 

Undoubtedly  best 

10. 

Remarkably  good 

Moderately  good 

So-so 

Not  very  good 

Unusually  poor 

Moderately  better 
Borderline 

Noticeably  worse 
Undoubtedly  worst 

11. 

Without  hesitation 

With  little  hesitation 

With  some  hesitation 

With  great  hesitation 

6. 

Fantastic 

Delightful 

Nice 

Mediocre 

Unpleasant 

Horrible 

12. 

Strongly  like 

Like  quite  a  bit 

Like 

Neutral 

Mildly  dislike 

Dislike  very  much 

Dislike  extremely 

(Table  continued  on  next  page) 


15S 


VIII-F  Page  5 
1  Jul  76 


Table  VIII-F-2  {Cent.) 

Sets  of  Response  Alternatives  Selected  So  That 
Intervals  Between  Phrases  Are  as  Nearly  Equal  as  Possible 


Set 

No. 

Response  Alternatives 

Set 

No. 

Response  Alternatives 

13. 

Like  quite  a  bit 

Like 

Like  slightly 

Borderline 

Dislike  slightly 

Dislike  Moderately 

Don't  like 

15. 

Very  Much  More 

A  little  More 

Slightly  less 

Very  Much  less 

14. 

Like  quite  a  bit 

Like  fairly  well 
Borderline 

Dislike  Moderately 
Dislike  very  Much 

160 


VIII>F  Page  6 
1  Jul  76 


Table  VIII-F-3 

Sets  of  Response  Alternatives  Selected 
from  Lists  Giving  Scale  Values  Only 


Set 

No. 

Response  Alternatives 

Set 

No. 

Response  Alternatives 

1. 

Very,  very  agreeable 

Usually  agreeable 

Quite  agreeable 

Agreeable 

6. 

Extremely  nice 

Decidedly  nice 

Nice  . 

Slightly  nice 

2. 

Rather  average 

Quite  average 

Unusually  average 

Decidedly  average 

7. 

Ordinary 

Slightly  ordinary 
Unusually  ordinary 

Very,  very  desirable 
Completely  desirable 

Very  desirable 

8. 

Extremely  pleasant 

3. 

Decidedly  pleasant 
Semewhat  pleasant 

Desirable 

9. 

Poor 

Very  poor 

4. 

Extremely  good 

Somewhat  good 

Very,  very  poor 

Slightly  bad 

Extremely  bad 

10. 

Very,  very  agreeable 
Extremely  agreeable 

Very  agreeable 

5. 

Slightly  Inferior 

Rather  Inferior 

Unusually  Inferior 

Extremely  Inferior 

Quite  agreeable 

Agreeable 

Mote.  Selected  so  that  Intervals  between  phrases  are  as  eoual  as 
possible. 


lei 


VIII-F  Page  7 
1  Jul  76 


Table  VIII-F-4 

Sets  of  Response  Alternatives  Selected 
Using  Order  of  Merit  Lists  of  Descriptor  Teras 


Set 

Mo. 

Response  Alternatives 

1. 

Very  good 

Good 

Borderline 

Poor 

Very  poor 

2. 

Very  satisfactory 

Satl sfactory 

Borderline 

Unsatisfactory 

Very  unsatisfactory 

3. 

Very  superior 

Superior 

Borderline 

Poor 

Very  poor 

4. 

Extrenely  useful 

Of  considerable  use 

Of  use 

Not  very  useful 

Of  no  use 

162 


IX-A  Page  1 
8  Mar  85 
(s.  I  Jul  76) 


This  chapter  considers  five  topics  related  to  the  physical  characterls* 
tics  of  questionnaires:  the  location  of  response  alternatives  relative 
to  the  stem  (Section  IX-B);  questionnaire  length  (Section  IX-C);  ques' 
tionnaire  fomat  considerations  (Section  IX-O);  the  use  of  answer 
sheets  (Section  1X*E):  and  the  use  of  branching  (Section  IX-E). 


163 


IX-B  Page  1 
8  Mar  85 
(s.  1  Jul  76) 

8.  Location  of  Response  Alternatives  Relative  to  the  Stem 

Research  to  detemlne  what  effect  the  location  of  response  alternatives 
relative  to  the  question  stem  ttas  on  subjects'  responses  Is  practically 
nonexistent.  There  Is  some  evidence,  however,  that  untrained  raters 
can  make  relatively  error-free  graphic  ratings  regardless  of  whether 
the  "good*  end  of  the  scale  Is  at  the  left,  right,  top,  or  bottom. 

In  designing  a  specific  questionnaire,  the  following  points  should  be 
considered  regarding  the  location  of  response  alternatives  relative  to 
the  stem: 

1.  mth  multiple  choice  items,  the  response  alternatives  arc  usually 
arranged  vertically  under  the  stem  as  shown  In  Figure  IV-C-1.  With 
a  large  number  of  response  alternatives,  two  or  more  columns  of 
vertically  arranged  alternatives  might  ^  used.  Sometimes,  If 
there  are  only  two  or  three  alternatives  (such  as  "Yes"  and  *80“), 
they  are  placed  horizontally  rather  than  vertically. 

2.  Graphic  rating  scales  are  usually  placed  horizontally  on  a  page. 
However,  the  descriptive  words,  phrases,  or  sentences  on  a  scale 
should  be  concentrated  as  much  as  possible  at  specific  points  on 
the  scale.  This  Is  usually  easier  If  the  scales  are  placed  ver¬ 
tically  on  the  page,  but  It  can  be  done  either  way.  Descriptors 
need  not  be  equally  spaced  along  graphic  scales,  and  should  not  be 
If  there  Is  reason  to  believe  the  psychological  distances  between 
them  are  not  equal. 

3.  With  nongraphic  (or  "numerical*)  rating  scale  Items  and  with  rank¬ 
ing  and  forced  choice  Items,  the  response  alternatives  are  usually 
placed  vertically  under  the  question  stem.  See  examples  In  Chapter 
lY.  Sometimes  rating  scale  Items  are  placed  horizontally  under  the 
stem  as  shown  In  Figure  V1I-8-I.  If  a  number  of  rating  scale  Items 
all  use  the  same  response  alternatives,  the  question  stems  can  be 
presented  In  a  coluan  with  the  response  alternatives  to  the  right 
as  Shown  In  Figure  IX-B-1. 

In  Figure  IX-B-1,  the  response  alternatives  have  been  rotated  90 
degrees  to  save  space.  An  effort  should  be  made  to  place  the 
response  alternative  horizontal  with  the  bottom  of  the  page  so  that 
the  respondent  does  not  need  to  turn  the  page  sideways  to  read 
them, 

4.  The  response  alternatives  for  semantic  differential  Items  are 
usually  placed  horizontally  on  the  page.  For  an  example,  tee 
Figure  IV-I-1. 

8.  Use  precoded  cards  with  alphabet  letters  for  responses  to  Items 
with  sensitive  content  that  might  be  viewed  as  threatening.  This 
would  be  appropriate  for  questionnaires  administered  by  personal 
Interview.  Selection  by  respondent  of  the  alphabet  letter  can 
later  be  transposed  to  nunbers  for  analysis  purposes.  This  tech¬ 
nique  Is  used  to  obtain  less  distortion  to  reduce  the  social  de- 
si  rabi  11^  response  set. 


164 


IX-B  Page  2 
8  Mar  85 
(s.  1  Jul  76) 


Figure  IX-B-1 

Arrangement  of  Items  With  Same 

Rating  Scale  Response  Alternatives 

1.  How  satisfied  or  dissatisfied  arc  you  with  each 

of  the 

following  1 

factors  or  things? 

'Z 

•8 

C 

ST 

ST 

T 

n 

T* 

• 

?*?? 

m 

1 

>o 

a. 

Type  of  furniture  In  barracks. 

MimHM 

b. 

Hedical  service  to  soldiers. 

mm^m 

c. 

Quality  of  mess  hall  food. 

«mwMM 

d. 

Leadership  of  generals. 

e. 

Opportunity  for  promotion. 

mmmmm 

mmmamm 

wmmtmm 

f. 

Army  pay. 

-  ■ 

- 

9- 

Civilian  opinion  of  Army. 

— 

— 

— — 

~ 

6.  For  rtspondciits  with  a  low  education  level,  an  easy  foraat  with 
stems  and  anchors  easy  to  understand  Is  essential.  Sometimes, 
respondents  have  a  preference  for  questionnaire  formats  with  which 
they  have  had  previous  experience.  However,  preference  for  sped* 
fic  kinds  of  formats  does  not  mean  that  the  results  will  be  more 
.  reliable.  Zn  some  studies,  respondents  were  more  accurate  In  their 
ratings  with  less  preferred  formats. 


165 


IX-C  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


C.  Questionnaire  Length 
1.  General 


The  length  of  questionnaires  used  In  field  terts  has  ranged  from 
one  page  to  as  auny  as  30  pages,  perhaps  sore.  How  long  can  one 
expect  a  respondent  to  work  effectively  at  the  questionnaire¬ 
answering  task?  At  what  point  do  attention  and  wotlvatlon  start  to 
degrade,  thereby  producing  poorly  considered  responses  or  the 
onlsslon  of  responses?  Research  Inforaatlon  on  this  point  Is  not 
available  to  provide  a  basis  fora  first  recomendatlon.  There  Is 
even  disagreement  on  the  effect  of  questionnaire  length  on  the 
response  rate  to  lulled  questionnaires.  The  number  of  Items  and 
Qtnber  of  pages  In  a  questionnaire  may  not  necessarily  be  related 
to  response  rate  for  mailed  questionnaires. 

However,  questionnaires  which  require  longer  than  one  hour  to 
complete  will.  In  most  situations,  cause  boredom  and  Indifference. 
Even  10  or  15  minutes  may  be  too  long  If  the  questionnaire  Is 
perceived  by  the  respondent  as  redundant  or  asking  unnecessary 
questions.  If  one  Is  concerned  about  the  effects  of  a  long  ques¬ 
tionnaire,  alternate  forms  should  be  used,  wherein  tho  order  of 
Items  Is  reversed  (or  approximately  so).  For  example,  the  Items 
answered  last  on  50X  of  the  forms  would  be  answered  first  on  the 
other  SOS  of  the  forms.  One  could  also  split  the  respondent  group 
In  half  and  give  half  of  the  que:.t1ons  to  each  group— provided  that 
the  two  groups  were  fairly  equivalent  In  relevant  characteristics. 
Splitting  the  respondent  group  In  half  Increases  the  complexity  of 
the  survey  and  may  affect  the  precision  of  the  measures.  It  Is 
assined  that  everything  else  would  already  have  been  done  to  reduce 
the  nuaber  of  Items  before  one  of  these  approaches  It  used. 

For  questionnaires  administered  by  Interview,  survey  guidelines 
were  established  by  the  Federal  Office  of  Management  and  Budget. 
They  suggest  that  Interviews  should  not  take  longer  than  half  an 
hour,  although  there  may  be  valid  reasons  why  more  Information 
would  be  required.  This  would,  of  course,  extend  the  Interview 
length.  Many  surveys  take  an  hour  or  longer  to  complete.  There 
are  no  firm  guidelines.  Pretesting  the  questionnaire  may  provide 
data  on  the  effect  of  nuaber  of  Items  on  the  response  rate. 

2.  Results  of  a  Study 

In  a  1976  study,  ARl  assisted  TCATA  In  obtaining  and  analyzing 
questionnaire  responses  from  a  group  of  trainees  whose  duration  and 
location  of  basic  and  advanced  individual  training  was  handled 
differently  from  the  usual.  The  nuaber  of  trainees  answering  Items 
1-7  and  48-54  of  a  S4-1ten  questionnaire  Is  shown  below.  Mote  that 
there  Is  very  little  drop  In  the  number  of  men  In  either  group  as 
we  skip  from  Items  1-7  to  Items  48-52.  This  suggests  that  a  50- 
Item  questionnaire,  administered  as  this  was,  was  not  so  long  that 
persons  stopped  responding  after  answering  successively  more  ques¬ 
tions. 


1C6 


IX-C  p»ge  2 
8  Mar  8S 
(s.  1  Jul  76) 

Now  note  the  sharp  drop-about  15X  and  98  for  the  two  groups-ln 
responses  to  Items  S3  and  54.  A  more  gradual  decrease  In  number  of 
Mople  responding  Is  more  what  one  would  expect  If  they  are  being 
%orn  down*  or  fatigued  by  excessive  length. 

This  result  was  puzzling,  but  then  It  was  noted  that  Items  53  and 
54  are  alone  together  on  the  tenth  and  final  page  of  the  question* 
nalre.  It  Is  speculated  that  many/most  of  those  not  answering 
Items  53  and  54  turned  page  10  over  along  with  page  9  and  thought 
they  had  answered  all  that  was  required  of  them.  No  one  checked 
their  questionnaires  when  they  were  handed  In  to  see  If  they  had 
left  any  Items  blank.  The  reductions  In  respondents  appears  more 
of  a  "last  page  phenomena*  than  a  consequence  of  an  excessively 
long  questionnaire. 


Item  # 

Cxoerlmental  Group 

Control  Grou 

1 

716 

512 

2 

716 

513 

3 

717 

511 

4 

714 

513 

5 

716 

514 

6 

713 

510 

7 

716 

511 

48 

707 

509 

49 

707 

508 

SO 

-  707 

508 

51 

707 

510 

52 

698 

505 

53 

593 

462 

54 

604 

461 

167 


IX-0  Page  I 
8  Mar  85 
(s.  1  Jul  76) 


Questionnaire  Format  Considerations 


This  section  addresses  the  format  of  questionnaire  Items,  title  and 
other  Identification  marks,  printed  Introductions,  planning  to  facili¬ 
tate  processing,  and  other  questionnaire  format  considerations. 


1.  Format  of  Questionnaire  Items  and  Format  Bias 


Item  format  biases  occur  when  responses  to  Items  (questions)  are 
Influenced  by  the  question  stem  or  response  alternatives.  The 
following  guidance  Is  provided: 


a.  The  format  of  all  questionnaire  Items  on  a  questionnaire  should 
be  consistent  whenever  possible.  Mixing  multiple  choice  ques¬ 
tions,  open-ended  questions,  scales,  etc.  Is  normally  not 
desirable. 


b.  Punctuation  and  question  structure  should  be  consistent  and  In 
accordance  with  proper  sentence  structure  principles.  Where 
Incomplete  sentences  (e.g.,  "The  training  that  I  have  received 
at  Fort  Hood  has  been*  with  five  response  alternatives  of  "very 
challenging*  through  "very  unchallenglng")  are  used  as  stems, 
no  extraneous  punctuation,  such  as  a  colon,  need  be  put  at  the 
end  of  the  stem.  The  first  word  of  the  response  alternatives 
should  not  be  capitalized  unless  they  would  be  If  the  statement 
were  written  as  a  continuous  sentence.  Terminal  punctuation  at 
the  end  of  the  response  alternatives  should  follow  the  same 
general  rule  of  consistency  with  normal  sentence  structure. 
Hence,  a  period  would  ordinarily  be  placed  after  each  response 
alternative. 


When  an  Item  consists  of  a  complete  question  (e.g.,  *How  sa¬ 
tisfied  or  dissatisfied  are  you  with  the  furniture  In  the 
barracks?*),  the  first  word  of  the  response  alternatives  should 
be  capitalized  since  It  does  not  continue  a  sentence.  If  the 
response  alternatives,  constitute  complete  sentences,  then  they 
should  have  periods  at  the  end,  or  whatever  other  terminal 
punctuation  Is  appropriate.  Sometimes  periods  are  placed  at 
the  end  of  cxtroaely  long  response  alternatives  even  If  they 
are  not  sentences.  Ordinarily,  then,  with  this  form  of  Item, 
periods  would  not  be  placed  after  the  response  alternatives. 

Exceptions  to  the  above  suggestions  should  be  made  whenever  the 
exception  would  Improve  clarity.  An  example  might  be  when 
periods  would  be  confused  with  decimal  points. 

c.  When  Items  are  ambiguous,  a  recognizable  pattern  of  Inappropri¬ 
ate  responses  Is  often  produced. 

d.  Item  format  bias  may  be  a  function  of  how  Items  are  sequenced 
and  grouped.  . 


1C8 


IX>0  Page  2 
8  Mar  85 
(s.  1  Jul  76) 

e.  Sone  authors  conclude  that  a  bias  on  be  expected  from  all 
e1osed*end  questions  where  answers  MSt  be  selected  from  two  or 
■ore  fixed  choices. 

f.  The  paired-comparison  format  may  be  useful  for  those  respon¬ 
dents  who  tend  to  check  aany  Items  from  a  11st»  and  for  those 
who  check  only  a  few. 

g.  Card  sorting  may  show  the  least  Item  format  bias. 

h.  With  two-way  choices,  some  respondents  have  a  tendency  to 
select  the  first  alternative.  Others  have  a  tendency  to  select 
the  second,  tilth  other  aultlple  choice  Items,  some  respondents 
have  a  tendency  to  select  certain  categories. 

1.  There  Is  some  evidence  that  the  first  response  alternative  to  a 
question  Is  chosen  somewhat  more  frequently  than  the  others. 

j.  Two  studies  were  conducted  by  Mayer  and  Piper  (1982)  regarding 
physical  layout  of  a  questionnaire.  Questionnaire  layout  can 
be  confusing  to  respondents.  The  wrong  categories  were  Ini¬ 
tially  marked  by  respondents  Indlating  erroneous  brand  prefer¬ 
ences.  An  example  Is  provided  In  Figure  IX-0-1  to  Illustrate 
how  modification  of  the  questionnaire  format  facilitated  clari¬ 
fication.  The  questionnaire  layout  that  confused  respondents 
did  not  have  a  response  alternative  for  *0ther  brand. ^  The 
layout  was  Identical  to  that  of  Brand  A  through  Brand  6  re¬ 
sponse  alternatives.  There  was  no  bracketed  response  alterna¬ 
tive  for  "Other  brand.* 

Mayer,  C.  $.,  I  Piper,  C.  (1982).  A  note  on  the  Importance  of 
layout  In  self-administered  questionnaires.  Journal  of  Harket- 
Inq  Research.  ^(3),  390-391.  — — — — — 


165 


IX-0  Page  3 
8  Mar  85 
(s.  1  Jul  76) 


Figure  IX-O-l 

Original  Questionnaire  Format 

Product  X  Product  Y 

Product  2 

Brand  F  — 
Brand  G  — 
Other  brand 
(SPECIFY) 

(  )6  (  )6 

(  )7  (  )7 

(  )6 
(  )7 

Modified  Questionnaire  Format 

Product  X  Product  Y 

Product  Z 

Brand  F  — 
Brand  G  — 
Other  brand 

(  )6  (  )6 

(  )7  (  )7 

(  )B  (  )8 

(  )6 
(  )7 
(  )8 

! 

Title  and  Other  Identification  Marks 

Each  questionnaire  should  carry  a  descriptive  title  centered  at  the 
top  of  the  first  page  of  questions  and  on  the  Instructional  and/or 
Introductory  cover  page  If  such  Is  used.  Each  questionnaire  form 
should  also  be  designated  by  fom  number  to  distinguish  It  from 
other  forms.  This  nunber  usually  goes  In  the  upper  left-hand 
comer  of  each  page. 

Printed  Introductions 

Introductions  are  sometimes  printed  at  the  start  of  a  questionnaire 
to  tell  respondents  the  purpose  and  importance  of  the  question¬ 
naire,  and  the  Importance  of  their  cooperation  In  answering  all 
questions  carefully.  Methodological  research  Is  needed  to  deter¬ 
mine  the  effectiveness  of  such  Introductions,  but  If  they  are  too 
lengthy,  there  Is  always  the  possibility  that  they  might  be  coun¬ 
terproductive.  Regardless,  If  the  Introduction  is  going  to  run 
more  than  a  quarter  of  a  page.  It  might  better  be  placed  on  a  cover 
sheet. 

See  Section  X-B  regarding  the  content  of  questionnaire  Instruc¬ 
tions. 


i 


170 


IX-0  Page  4 
8  Mar  85 
(t,  I  Jul  76) 

4,  Planning  te  Facilitate  Processing 

Mherc  possible,  questionnaires  should  be  designed  to  facilitate 
data  collection,  reduction,  and  analysis.  This  frequently  involves 
fomulating  the  questionnaire  for  Mchine  processing.  For  small 
samples,  however,  manual  processing  should  normally  be  employed 
since  the  effort  needed  to  plan  for  machine  processing  Is  not 
Justified  by  anticipated  data  reduction  time  uvings.  How  to 
format  a  questionnaire  for  machine  processing  is  outside  the  cur¬ 
rent  scope  of  this  manual.  See  Section  IX-£  regarding  the  use  of 
answer  sheets. 

5.  Other  Questionnaire  Format  Considerations 

a.  If  the  respondent's  name,  rank,  etc.  is  really  needed,  ask  for 
it  on  the  front  page.  (See  also  Section  X-C.)  Sometimes  other 
Information  is  needed  about  respondents  so  that  it  can  be 
correlated  with  their  responses.  This  may  include  duty  MOS, 
special  army  training,  combat  experience,  etc.  If  it  is  really 
needed,  it  is  usually  asked  for  on  the  front  page  along  with 
name. 

b.  If  a  questionnaire  has  over  two  pages,  page  numbers  should  be 
used.  They  are  ordinarily  put  at  the  center  bottom  of  each 
page. 

c.  A  questionnaire  should  not  be  crowded  or  cluttered  in  appear¬ 
ance.  If  it  is.  certain  items  might  be  missed. 

d.  Each  item  in  a  questionnaire  should  be  numbered  or  lettered  so 
it  can  be  identified  and  referred  to. 

e.  Sufficient  room  should  be  left  for  the  respondent  to  write  in 
answers  to  open-ended  questions. 

f.  Directions  should  be  well  displayed  and  unmistakably  clear. 

g.  There  is  research  evidence  that  an  attractive  questionnaire 
Increases  response  rates. 

h.  Different  colored  pages  or  questionnaire  forms  my  aid  in  the 
sorting  of  data  and  may  have  appeal  to  the  respondents. 


171 


IX-E  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


E.  Use  of  Answer  Sheets 

As  noted  In  Section  IXH)  4»  when  possible,  questionnaires  should  be 
designed  to  facilitate  data  collection,  processing,  and  analyses. 

Hence,  If  the  number  of  questions  warrant  It,  consideration  should  be 
given  to  the  use  of  separate  answer  sheets.  An  answer  sheet  can  be 
designed  for  either  hand  or  machine  processing. 

Mhen  considering  the  possible  use  of  answer  sheets,  the  following 
points  should  be  kept  In  mind: 

1.  The  use  of  a  separate  answer  sheet  may  require  additional  or  dif* 
fcrent  abilities  than  responding  on  the  questionnaire  Itself. 

2.  Depending  upon  their  prior  experiences  with  then,  respondents  nay 
find  It  more  difficult  to  use  a  separate  answer  sheet  than  to 
respond  on  the  questionnaire  sheet. 

3.  I*  iS  normally  more  difficult  and  t1ne*constn1ng  for  the  respondent 
to  use  a  separate  answer  sheet.  However,  separate  answer  sheets 
have  been  used  successfully  for  some  purposes. 

4.  When  separate  answer  sheets  are  employed,  the  questionnaire  book* 
lets  are  reusable. 

5.  Respondents  sometimes  err  In  using  the  last  spaces  on  a  multiple 
choice  answer  sheet  when  there  are  more  spaces  than  response  alter* 
natives.  This  can  be  avoided  by  the  use  of  ta11or*made  sheets. 


172 


IX-F  Page  1 
8  Mar  85 


Use  of  Branching 


Some  questionnaires  are  constructed  so  that  respondents  need  not  answer 
every  question  In  the  survey.  Branching  Is  used  to  guide  respondents 
through  a  survey  Instrinent  to  appropriate  questions.  This  technique 
requires  the  construction  of  questions  which  arc  Integrated  and  then 
arranged  to  laplenent  the  purposes  of  the  branching. 

X,  When  to  Use  Branching 

Branching  Is  used  when  the  researcher  wants  to  screen  respondents 
and  assign  them  to  suboroups.  It  Is  also  a  way  to  guard  against 
having  the  respondents^  answers  be  Influenced  by  the  questlon(s) 
that  ere  bypassed  In  the  branching  (or  branched  around).  This  Is 
known  as  position  effect.  The  questionnaire  educates  the  respon¬ 
dent  on  the  topic  which  It  covers.  Branching  can  be  used  to  mea¬ 
sure  the  effects  of  the  questionnaire  educating  the  respondent. 

The  more  forward  branching  that  occurs,  the  less  education  Is  being 


Branching  can  be  used  to  reduce  Interview  time  for  questionnaires 
administered  by  Interview.  Clear  branching  Instructions  are  re¬ 
quired  for  the  Interviewer,  as  well  as  Interviewer  training.  Self- 
admlnlstered/group-admlnlstered  questionnaires  which  use  branching 
can  reduce  the  time  to  complete  for  the  respondents.  However, 
there  Is  a  greater  risk  of  Item  nonresponse  following  a  branch  for 
these  questionnaires. 

Filter  Questions 


Filter  questions  are  developed  to  determine  how  respondents  are  to 
be  guided  through  the  questionnaire.  Filter  questions  screen  out 
respondents  from  certain  sets  of  questions  on  the  questionnaire. 
Branching  Is  used  so  that  the  respondent  can  be  routed  Into  another 
subset  of  questions.  Consequently,  the  survey  functions  as  a  set 
of  filters  throu-^h  which  some  respondents  pass  while  others  are 
detained  or  routed  into  different  topic  areals). 

Branching  Applications 


Respondents  i*>e  rorted  through  the  questionnaire  by  presenting  them 
with  more  difficult  or  more  concrete  questions.  This  approach 
forces  the  respondent  to  consider  the  topic  area  from  many  view¬ 
points.  Clear  branching  Instructions  are  required  for  all  ques¬ 
tionnaires.  Items  lomiedlately  following  a  branch  tend  to  have  an 
Increased  rate  of  nonresponse;  to  this  extent,  the  branching  In¬ 
structions  were  unclear  or  not  understood. 


173 


IX-F  Pige  2 
8  Har  85 

4,  Recownendatlons 

Surveys  conducted  by  Interview  which  use  branching  «ay  be  choppy. 
Interviewers  require  smooth  transitions  between  branches,  and 
training  In  conducting  the  survey.  Branching  Is  best  used  to 
reduce  Interview  time.  Branching  may  be  used  to  reduce/avoid 
exposing  the  respondent  to  Items  that  are  Irrelevant  or  non- 
essential.  This  forces  the  Interviewer  to  ask  only  pertinent 
questions  regardless  of  the  Interviewer's  persuasion.  Branching 
for  mall  surveys  and  group*adm1n1stered  surveys  has  a  greater 
probability  of  Increasing  Item  nonresponse  rate  than  a  survey 
conducted  by  Interview. 

There  are  alternatives  to  branching,  such  as  the  design  of  dif* 
ferent  questionnaire  packages  for  the  difficult  categories  of 
respondents.  An  Illustration  of  this  approach  was  used  In  the  Army 
Research  Institute's  test  of  the  Bradley  Fighting  Vehicle.  Four 
separate  questionnaires  were  designed:  one  for  the  driver,  one  for 
the  track  coonander,  one  for  thi  gunner,  and  one  for  the  remaining 
personnel. 

When  branching  questionnaires  are  used  to  measure  respondent  at* 
titudes,  there  Is  a  greater  possibility  for  Introducing  bias  Into 
the  data.  Questionnaires  covering  topic  areas  dealing  with  fact 
Instead  of  attitude  are  preferred  when  branching  Is  used.  Branch¬ 
ing  may  be  used  to  reduce/avoid  exposing  the  respondent -to  Items 
that  are  Irrelevant  or  nonessentlal.  This  forces  the  Interviewer 
to  ask  only  pertinent  questions  regardless  of  the  Interviewer's 
persuasion.  See  Sections  Vl-F-1  and  X*D*3. 


174 


X*A  Page  1 
8  Mar  8S 
(s.  1  Jul  76) 


Chapter  X:  Considerations  Related  to  Questionnaire  Adwinistratfon 
A.  Overview 


Considerations  related  to  the  administration  of  questionnaires  are 
discussed  In  this  chapter.  Such  matters  art  obviously  of  concern  when 
questionnaires  are  constructed.  Questionnaire  Instructions  are  dis¬ 
cussed  In  Section  X-B.  anonymity  for  respondents  In  Section  X-C.  and 
motivational  factors  related  to  questionnaire  administration  In  Section 
X-0.  Administration  time,  characteristics  of  administrators,  and 
administrative  conditions  are  the  topics  of  Section  X-£,  X-F,  and  X-G, 
respectively.  The  training  of  raters  and  other  evaluators  is  the 
concern  of  Section  X-H,  while  other  factors  related  to  questionnaire 
actalnl strati on  are  considered  In  Section  X-I. 


175 


X-B  Page  1 
1  Jul  76 

B.  Instructions 

Care  mst  be  exercised  In  preparing  Instructions  for  questionnaires 
since  they  are  quite  likely  to  affect  the  way  the  respondent  answers 
the  questions.  For  example*  even  mildly  anger-arousing  printed  In¬ 
structions  may  elicit  responses  of  negativism. 

« 

Although  further  research  Is  needed  to  fully  determine  the  Influence  of 
Instructions  on  responses^  some  practical  guidelines  can  be  offered: 

1.  It  Is  sometimes  preferred  that  an  oral  statement  of  questionnaire 
purpose  be  given  to  respondents.  If  this  Is  not  practical  or  a 
person  with  appropriate  credibility  and/or  status  unnot  be  sup¬ 
plied  to  make  the  statement,  then  a  printed  statement  must  suffice. 
(See  Section  IX-p  3  regarding  printed  Introductions.) 

2.  Lengthy  Instructions  for  completing  questionnaires  should  be 
avoided.  They  may  tend  to  confuse  the  respondents  rather  than  help 
them. 

3.  The  option  of  orally  presenting  Instructions  Is  often  available. 
Uhen  oral  Instructions  are  given,  they  are  usually  given  Just  prior 
to  administering  the  questionnaire. 

4.  If  Instructions  are  given  orally  and  an  Illustration  Is  needed,  a 
.  visual  display  should  be  available  which  may  Include  a  printed 

version  of  more  complex  Instructions. 

5.  When  questionnaires  are  group-administered.  It  should  be  announced 
that  aides  will  check  each  respondent's  questionnaire  for  complete¬ 
ness,  If  such  a  process  can  be  Implemented. 

6.  "Cute*  examples  on  Instructions  should  not  be  used.  They  will 
damage  rapport  and  detract  from  the  seriousness  of  the  question¬ 
naires,  particularly  for  more  mature  and  older  respondents.  It  Is 
best  to  use  a  neutral  example  that  will  be  suluble  for  all  re¬ 
spondents. 

7.  Obviously,  Instructions  should  be  given  In  a  way  that  all  respon¬ 
dents  can  understand  them.  Care  should  be  exercised  about  the 
level  of  vocabulary  used. 

An  example  Is  given  on  the  following  page  of  the  Instructions  that 
might  precede  the  Items  of  a  questionnaire.  In  this  example,  the 
responses  were  to  be  given  on  a  separate  "answer”  or  response  sheet. 


176 


X-B  Page  2 
8  Mar  85 
(S.  1  Jul  76} 


TRA1NIM6  AHITUOE  QUESTIONNAIRE  (BASIC  AND  AIT) 


INSTRUCTIONS:  The  purpose  of  this  questionnaire  is  to  obtain  Information 

from Voi . regarding  training,  working  and  livina  while  in  the  Army  s  Basic 

Training  and  Advanced  Individual  Training  (AIT)  program.  Your  answers  will 
help  the  Army  to  determine  what  conditions  are  in  need  of  Improvement,  and 
will  assist  the  Army  In  determining  the  actions  they  must  take  to  Improve 
training  and  the  quality  of  life  for  new  soldiers  in  the  Army.  Your  homst 
opinions  are,  therefore,  essential. 

Ne  have  no  need  to  know  who  you  are  personally.  No  effort  will  be  made  to 
identify  either  you  or  your  unit.  00  NOT  URITE  YOUR  NAME,  SOCIAL  SECURITY 
NUMBER,  OR  UNIT  on  either  the  questionnaire  or  the  answer  sheet. 

Each  question  should  be  answered  by  circling  the  letter  on  your  answer 
sheet  which  is  next  to  the  answer  which  best  describes  your  feelings.  See 
sample  question  below: 

SAMPLE  QUESTION:  3.  How  old  are  you? 

a.  17 

b.  18 

c.  19 

d.  20 

e.  21  or  older 

If  you  are  19  years  old,  you  should  circle  the  letter  c  on  your  answer 
sheet  for  question  3,  as  has  been  done  below,  since  tim  letter  c  corre* 
spends  to  your  correct  age  of  19  on  the  questionnaire. 


NUMBER 

(CIRCLE  ONE) 

01 

a 

b 

c 

d 

e 

02 

a 

b 

c 

d 

e 

03 

a 

b 

c 

d 

e 

04 

a 

b 

c 

d 

e 

If  you  have  any  questions,  please  ask  the  questionnaire  administrator  for 
assistance.  You  will  have  30  minutes  to  complete  the  questionnaire.  You 
will  all  turn  in  your  answer  sheets,  and  leave  at  the  same  time.  Do  not 
turn  the  page  and  start  to  work  until  instructed  to  do  so. 


177 


X-C  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


C,  Anonymity  for  Respondents 
1.  Factors  to  be  Considered 


There  are  several  factors  to  be  considered  when  deciding  whether  to 
require  the  respondent's  name  or  other  Identifying  Information  on  a 
questionnaire.  Some  of  the  factors  are  supported  by  research, 
while  others  are  not. 

a.  If  the  respondents  supplied  their  names,  they  are  aware  that 
they  can  be  Identified  and  called  back.  If  respondents  do  not 
have  to  give  their  names  or  similar  Information,  most  will 
believe  that  they  cannot  be  Identified  and  called  back  for  any 
^e  of  accounting  after  their  questionnaires  have  been  col* 
lected. 

b.  The  perception  of  anonymity  seems  to  depend  not  only  upon 
whether  respondents  give  their  names,  but  also  on  the  condi¬ 
tions  under  which  the  questionnaires  are  administered.  For 
example,  paper-and-pencll  questionnaires  are  more  anonymous 
than  structured  Interviews. 

c.  The  effects  of  anonymity  seem  to  be  related  to  the  content  of 
the  questionnaire.  This  Is  particularly  true  when  Information 
on  sensitive  areas  Is  collected.  For  general  attitudes.  It  may 
not  matter. 

d.  The  effects  of  anonymity  may  also  depend  upon  who  administers 
the  questionnaire,  and  the  circumstances  under  which  It  Is 
administered.  Responses  may  be  distorted  when  respondents  are 
Identified  and  under  high  threat. 

e.  Respondents  may  be  more  lenient  when  rating  other  personnel  If 
they  think  they  will  be  Identified. 

2.  Implications  of  the  Privacy  Act  of  1974 

If  the  experimenter,  test  officer,  or  questionnaire  writer  desires 
to  obtain  certain  types  of  personal  Information  from  a  respondent, 
the  federal  Privacy  Act  of  1974,  In  turn,  requires  that  certain 
Information  first  be  given  to  the  candidate  respondent.  One  may 
use  DA  Form  4368-R,  1  Nay  75  for  the  purpose  of  communicating  this 
Information  to  the  respondent.  The  form  Is  shown  filled  out  on 
page  X-C  4.  In  this  particular  example,  the  research  questions 
dealt  with  attitudes  toward  respondents'  treatnent  In  the  Army. 


X-C  Page  2 
8  Her  85 
(s.  1  Jul  76) 


A  second  example.  Figure  X-C*l,  Illustrates  a  more  compact  format. 
The  same  elements  of  Information  called  for  by  OA  Form  4368-R  have 
been  communicated;  It's  Just  that  that  form  was  not  used. 

A  privacy  act  statement  Is  not  necessarily  required  as  a  part  of 
all  questionnaires  that  are  administered  to  Army  personnel.  It  Is 
not  necessary  where  only  the  personal  Information  listed  below  Is 
being  requested.  For  example,  no  Invasion  of  privacy  Is  Involved 
where  soldiers  are  asked  to  evaluate  some  new/revised  weapon, 
equipment,  or  organization  regarding  effectiveness  and/or  accept* 
ability,  and  to  answer  any  of  the  12  Items  listed  below.  The  col¬ 
lection  or  release  of  the  following  Information  does  not  require 
the  consent  of  the  respondent: 

a.  Grade. 


b.  Date  of  birth. 

c.  Date  of  rank. 

d.  Salary. 

e.  Present  duty  assignment. 

f.  Past  duty  assignments. 

g.  Future  assignments  (approved). 

h.  Unit  and/or  office  address. 


I.  Unit  and/or  office  phone  number. 

J.  Source  of  commission. 

k.  Military  and/or  civilian  education. 

l.  Promotion  sequence. 


Data  collection  procedures  that  guarantee  anonymity  are  desirable 
for  surveys.  If  the  research  methods  cannot  guarantee  anonymity, 
then  confidentiality  of  the  data  Is  to  be  protected.  For  opera¬ 
tional  test  and  evaluation  research,  participants  should  be  In¬ 
formed  that  the  data  cannot  be  kept  confidential.  Surveys  requir¬ 
ing  the  names  of  participants  can  have  records  coded  as  soon  as 
possible.  The  key  to  the  code  can  be  stored  for  limited  access  to 
protect  confidentiality.  (See  American  Psychological  Association 
(1982).  Ethical  principles  In  the  conduct  of  research  with  huaan 
participants,  liashlngton.  bc.i  : 


X-C  Page  3 
8  Mar  85 

If  you  have  any  questions  concerning  the  Privacy  Act  of  1974,  you 
■ay  obtain  additional  Information  from  the  ARI  Field  Unit.  All 
questionnaire  respondents  must  be  advised  of  the  requirements  of 
the  Privacy  Act  of  1974  when  any  of  the  18  types  of  information 
listed  below  are  being  requested.  This  Information  can  only  be 
obtained  from  an  Individual  on  a  voluntary  basis.  The  release  of 
any  of  the  information  listed  below  requires  the  prior  and  informed 
consent  of  the  individual. 

a.  Mane. 

b.  Social  Securl^  number. 

c.  Hone  address. 

d.  Hone  phone  number. 

e.  Home  of  record. 

f.  Financial  transactions. 

g.  Character  quality. 

h.  Efficiency  ratings. 

1.  Conduct  ratings. 

j.  Legal  affairs. 

k.  Religious  preferences. 

l.  Number  of  allotments, 
a.  Anount  of  allotments, 
n.  Medical  history. 

0.  Criminal  history. 

p.  Fingerprints. 

q.  Volceprints. 

r.  Photographs. 


ISO 


DATA  REQUIRED  BY  THE  PRIVACY  ACT  OF  1S74 
ff  V.S.C  Utt 


rniSCRltiMC  OiMtCTiVE 

All  70-1 


1  AUTHONlTV 


10  use  Sac  4503 


HlJ.TrT4I- 


*L  ^URfOSiltl 


Tb*  data  collcctAd  with  tht  AttAchtd  fora  ar*  to  b«  uatd  to 
rooAArch  purpooas  only. 


Thlo  1*  an  aacparlsantal  parsennal  data  eoUactlon  foxa  davalopad  by  tba 
U.S.  Azay  Kaaaarch  Inatltuta  for  tha  Bahavloral  and  Social  Selaneas  pursuant 
to  its  rasaarch  alsslon  .as  praseribad  In  AX  70-1.  Whan  idantifiar  (naaa  or 
Social  Sacurlcy  Ihobar)  ara  raquascad  thay  ara  to  ba  usad  for  adainlstratlva 
and  statistical  control  ptirposas  only.  7^1  confidantlallty  of  tha 
rasponaas  vUl  ba  aalntalnad  In  tha  procasslnt  of  thasa  data. 


4.  MANOAtOAV  OH  VObUNTAHY  OltCWOtUM  AMO  OM  IMOlVlOUAk  MOT  ^HOVlOiMO  IM^OHMATlOM 

Tour  participation  la  this  rasaarch  Is  strictly  voluntary.  Individuals  ara 
ancouragad  to  provlda  eeaplata  axid  accursta  Infocaatloa  la  tha  latarasts  of 
tha  rasaarch,  but  tbara  vlll  ba  no  affact  on  Individuals  for  not  providing 
all  or  any  part  of  tha  lafocnatlon.  This  noelca  aay  ba  datachad  froa  the 
rast  of  tha  fom  aad  ratainad  by  tba  individual  If  so  daslrad. 


FORM 


FrIyMv  Ad  Stat*«n«nt  •  2S  t«e  7 


X-C  Page  5 
8  Nar  85 
(S.  1  Jul  76) 


Figure  X-C-1 

An  Example  of  a  Privacy  Act  Statement 
IIB/C  GRADUATE  FIELD  SURVEY 

(Prescribing  Directive:  AR  600*46;  TRAOOC  Ltr  dtd  29  Aug  75) 


IMFORMATIOM  PRIVACY  ACT  STATEMEMT 

1.  Authorial  5  use  301,  10  USC  3012,  Author!^  for  the  Secretary 
of  the  Army  to  Issue  AR's;  44  USC  3101,  Author!^  for  Collect* 
ing  Mecessary  Data. 

2.  Principal  Purpose:  To  collect  data  to  evaluate  the  effective* 
ness  of  individual  training  received  prior  to  Joining  one's 
initial  unit  of  assignment. 

3.  Routine  Uses:  The  data  collected  with  this  form  are  to  be  used 
for  research  purposes  only.  They  will  not  become  a  part  of  any 
individual's  record  and  will  not  be  used  in  whole  or  in  part  in 
making  any  determination  about  an  individual. 

The  identifiers  (name  or  Social  Security  Number)  are  to  be  used 
for  administrative  and  statistical  control  purposes  only.  Full 
confidentiality  of  responses  will  be  maintained  in  the  process* 
ing  of  these  data. 

4.  Mandatory  or  Voluntary  Disclosure  and  Effect  on  Individual  Not 
Providing  Information:  Voluntary  *  Your  participation  in  this 
research  is  strictly  voluntary.  Individuals  are  encouraged  to 
provide  complete  and  accurate  information  in  the  Interests  of 
the  research,  but  there  will  be  no  effect  on  individuals  not 
providing  all  or  any  part  of  the  infomation. 

This  notice  may  be  detached  from  the  rest  of  this  form  and 
retained  by  the  individual  answering  the  questionnaire  if  so 
desired. 


182 


X-0  Page  1 
8  Mar  85 
(s.  1  Jul  76) 

0.  Motivational  Factors 

This  section  considers  the  effects  of  lack  of  aotlvatlon,  and  some  ways 
of  providing  a  desirable  level  of  Motivation  to  respondents  during  the 
questionnaire  administration  process. 

1.  Effects  of  Lack  of  Motivation 

generally,  the  results  of  any  study  will  suffer  distortion  If  those 
to  whom  the  questionnaire  Is  distributed  are  not  sufficiently  motl* 
vated.  If  they  have  the  choice,  tiiey  will  not  respond  at  all.  If 
they  do  have  to  respond  or  are  Just  Minimally  motivated,  they  may 
omit  Items,  make  patterned  or  random  responses,  or  Just  generally 
respond  poorly.  As  a  result,  the  reliability  and  validity  of  the 
responses  will  be  decreased,  and  the  results  of  the  study  would 
lead  their  reader/user  Into  some  degree  of  error. 

2.  Ego  Involving  Potential  Respondents  In  the  Study 

There  are  a  ninber  of  ways  that  motivation  can  be  Increased  by  ego 
Involving  potential  respondents.  Some  of  the  ways  are  given  below: 

a.  The  special  role  of  the  respondent  In  the  study  can  be  empha¬ 
sized. 

b.  Responsibility  can  be  stressed  when  It  Is  appropriate  to  do  so. 

c.  The  wording  of  cover  letters.  If  used,  affects  ego  Involvement. 
Help  may  sometimes  be  requested  on  tne  basis  of  appealing  to 
the  self-interests  of  the  respondent.  There  Is  evidence  that 
this  type  of  appeal  helps  most  with  less  educated  respondents. 

3.  Stimulating  the  Return  of  Remotely  Administered  Questionnaires 

Obviously,  whatever  Involves  the  egos  of  potential  respondents  In  a 
study  also  stimulates  the  return  of  remotely  administered  question¬ 
naires.  such  as  those  distributed  by  mall.  Other  ways  of  stimulat¬ 
ing  the  return  or  response  rate  are: 

a.  Return  rates  may  often  be  significantly  Improved  when  a  letter 
Is  sent  In  advance  notifying  the  potential  respondents  that 
they  will  receive  a  questionnaire  and  their  help  Is  needed  In 
.  filling  It  out. 


133 


X-0  Page  2 
8  Mar  85 
(s.  1  Jul  76) 


b 


b.  Stanped  and  return  addressed  envelopes  can  be  sent  with  the 
questionnaire.  There  Is  evidence  that  this  does  Increase 
response  rate. 

c.  There  Is  contradictory  evidence  about  whether  short  question¬ 
naires  are  returned  aore  frequently  than  longer  ones,  but  one 
would  probably  believe  this  to  be  true. 

d.  Pollow-up  reminders  can*be  sent  to  those  who  do  not  promptly 
return  their  questionnaires.  There  Is  some  question,  however, 
regarding  how  much  such  follow-ups  Increase  response  rate.  At 
times.  It  may  not  be  cost  effective,  so  maybe  the  decision 
should  be  a  function  of  whether  or  not  the  Initial  return  rate 
was  adequate. 

e.  Telephone  Interview:  and  face-to-face  Interviews  generally  have 
a  higher  response  rate  than  mall  surveys. 

f.  Response  rate  for  telephone  Interviews  can  be  Increased  by 
changing  the  format  from  what  would  be  used  In  a  face-to-face 
Interview.  Select  fewer  Items  and  Items  which  are  shorter  In 
length  for  telephone  Interviews  .to  reduce  telephone  discon¬ 
nects. 

g.  Monresponse  for  Items  following  a  branch  may  Increase  the 
overall  Item  nonresponse  rate,  especially  for  mall  surveys.  If 
branching  can  be  avoided,  this  may  Increase  Item  response  rate. 

Use  of  Incentives 


The  evidence  has  been  mixed  regarding  the  extent  to  which  motiva¬ 
tion  Is  Increased  through  the  use  of  Incentives.  Incentives  may 
Include  money,  time  off,  special  privileges,  etc.  Generally, 
however.  It  is  agreed  that  Incentives  usually  help  Increase  the 
response  rate  with  remotely  administered  questionnaires. 


184 


X-0  Page  3 
8  Har  85 
(t.  1  Jul  76) 

5.  Other  Motivational  Factors  Related  to  Questionnaire  A<!tefnf  strati  on 

Many  additional  notlvatlonal  factors  related  to  questionnaire 

a(ta1n1strat1on  can  be  noted  or  Inferred  fron  other  sections  In  this 

■anual.  Some  of  then  are: 

a.  Respondents  often  have  preferences  for  certain  Iten  formats, 

although  some  tines  such  preferences  nay  not  offer  any  advantage 
In  terns  of  reliability  and  validity.  Sone  subjects  prefer 
rating  scales  to  forced  choice  Items.  With  forced  choice,  some 
like  the  option  of  Indicating  the  degree  of  applicability  of 
each  statement.  Some  do  not  like  forced  choice  (hsort  (see 
Section  Sone  prefer  multiple  category  to  two  category 

options.  In  some  studies,  Likert  scales  have  been  preferred  to 
behavioral  scales.  Behavioral  1y  Anchored  Rating  Scales  have 
been  preferred  to  Nixed  Standard  Scales,  etc.  These  prefer¬ 
ences  may  relate  to  fanlllarlty  of  the  respondent  with  given 
Iten  tyres.  There  Is  not  much  that  the  questionnaire  designer 
can  do  about  such  preferences,  except  to  note  that  they  exist. 

b.  Researchers  In  recent  years  have  explored  the  cognitive  com¬ 
plexity  of  respondents  to  match  them  to  formats  which  are 
cognitively  compatible.  There  have  been  problems  with  repli¬ 
cation  for  this  research. 

c.  Motivation  nay  be  Increased  by  offering  feedback  of  study 
results  to  the  respondent. 

d.  Every  effort  should  be  made  to  praise  the  respondents  or  poten¬ 
tial  respondents,  to  the  extent  that  It  Is  reasonable. 

e.  Long,  vague,  or  boring  questionnaire  sessions  should  be 
avoided,  since  It  will  decrease  respondent  motivation  to  con¬ 
tinue  attending  and  providing  "best*  responses. 

f.  Questionnaire  administration  sessions  should  not  be  scheduled 
when  there  are  conflicts  with  other  activities  of  greater 
Interest  to  the  respondents.  Nor,  In  general,  should  they  be 
scheduled  very  early  or  very  late  In  the  day. 

g.  Volunteers  are  usually  more  motivated  to  fill  out  question¬ 
naires  than  are  nonvoi«inteers.  However,  their  replies  may  be 
more  biased. 

h.  When  respondents  are  told  that  they  may  leave  as  soon  as  they 
have  completed  the  questionnaire,  they  usually  do  a  much  more 
hasty  and  unsatisfactory  Job  than  when  they  are  given  a  speci¬ 
fic  time  for  completion,  and  are  told  that  they  cannot  leave 
until  the  time  period  Is  up. 

1.  See  Chaptyr  XIY  about  the  behavior  of  Interviewers. 


135 


E.  Administration  Tfme 


X>E  Page  I 
8  Mar  85 
(s.  1  Jul  76) 


\ 


Little  Is  known  about  the  effects  of  questionnaire  administration  time 
on  respondents'  motivation,  or  of  the  effects  of  setting  time  limits 
for  completing  questionnaires.  The  questionnaire  administration  period 
should  generally  have  been  determined  In  advance  by  pretesting.  There 
will  be  some  variability  In  the  length  of  time  taken  to  complete  a 
questionnaire.  There  Is  remarkable  consistency  among  those  who  are 
sincere  In  attempting  to  do  an  accurate  and  complete  job  of  answering 
all  questions. 

When  a  questionnaire  Is  administered  to  a  group  of  respondents,  the 
Instruction  should  emphasize  that  all  respondents  will  be  given  plenty 
of  time  to  answer  the  questions.  As  Indicated  earlier  In  XH)  5  h,  the 
Instructions  should  not  tell  the  respondents  that  they  can  leave  as 
soon  as  they  have  finished  the  questionnaire.  Many  will  then  cut  short 
their  efforts  to  answer  the  questions.  There  Is  little  hope  of  obtain¬ 
ing  carefully  considered  evaluative  responses  on  a  questionnaire  If  the 
respondents  knows  that  the  faster  they  finish  the  questionnaire,  the 
sooner  they  will  be  able  to  go  home. 

Questionnaire  administration  time  Is  obviously  related  to  questionnaire 
length,  which  Is  the  topic  of  Section  IX-C. 

One  should  try  to  determine  empirically  the  maximum  time  needed  to 
complete  a  given  questionnaire.  If  the  questionnaire  Is  group-adminis¬ 
tered,  the  maximum  time  for  the  slowest  respondents  should  usually  be 
used  In  scheduling  the  administration  of  the  questionnaire. 


186 


X-F  Page  1 
8  Mar  85 
(s.  1  Jul  76) 

F.  Characteristics  of  Administration 

Little  has  been  established  In  the  research  literature  about  how  the 

characteristics  of  questionnaire  administrators  affect  the  overall 

process  with  nonremotely  ’  bolstered  questionnaires.  The  following 

Items  may  be  noted: 

1.  In  most  cases.  It  Is  felt  that  the  sex  of  the  administrator  has  no 
effect  on  the  responses  received.  There  may.  however,  be  certain 
motivational  effects. 

2.  The  military  rank  of  the  administrator  may  have  an  effect  on  the 
respondent,  but  no  research  has  been  perforaed  to  examine  this. 

3.  Any  effect  that  the  race  of  the  administrator  has  on  the  respondent 
may  also  be  a  function  of  the  content  material  of  the  question¬ 
naire,  e.g.,  race  would  be  expected  to  Influence  responses  on  a 
race  relations  questionnaire  more  than  on  a  questionnaire  dealing 
with  rifle  comparisons.  The  effects  should  probably  be  viewed  as 
the  result  of  Interaction  between  administrator  and  respondent 
characteristics,  and  the  questions  being  asked. 

4.  Implications  exist  for  biasing  survey  results  whenever  surveys 
Incorporate  face-to-face  Interviewing  with  Individuals  from  dif¬ 
ferent  ethnic  backgrounds.  Items  with  racial  content  used  In  a 
questionnaire  are  especially  sensitive  to  such  biasing. 

5.  See  Chapter  XIY  about  the  Influence  of  an  Interviewer  on  the  Inter¬ 
viewee. 


137 


6.  Administration  Conditions 


XH>  Page  1 
1  Jul  76 


P- 


Questionnaire  administration  conditions  obviously  cannot  be  controlled 

with  remotely  administered  questionnaires.  With  group*a(fm1n1stered 

questionnaires,  the  following  guidance  Is  offered:  4 

1.  Administration  conditions  should  be  provided  which  are  most  appro¬ 
priate  to  the  particular  type  of  respondent  completing  the  ques¬ 
tionnaire. 

2.  Administration  conditions  have  an  effect  on  questionnaire  re¬ 
sponses.  For  example,  different  responses  may  be  obtained  If  the 
questionnaire  Is  filled  out  In  a  group  situation  on  the  Job  rather 
than  Individually  at  home. 

3.  When  personnel  are  being  rated,  different  ratings  may  be  obtained, 
depending  on  how  acquainted  the  rater  and  ratee  are. 

4.  For  Army  field  test  evaluations,  the  clrcuastances  under  which 
questionnaires  must/ can  be  administered  will  vary  rather  widely. 
There  may  be  times  when  no  writing  surface(s)  or  pencils  are  avail¬ 
able;  clipboards  and  pencils  should  be  supplied  If  this  problem  can 
be  anticipated.  If  the  needed  materials  cannot  be  brought  to  the 
respondents,  then  arrange  to  move  them  to  a  place  where  the  materi¬ 
als  and  other  environmental  conditions  are  satisfactory. 

5.  Respondents  should  be  required  to  give  their  answers  without  being* 
Influenced  by  other  respondents.  Achieving  this  requires  respon¬ 
dents  to  be  somewhat  separated  and/or  to  have  the  admlnlstrator(s) 
watching  them.  Simply  Instructing  them  not  to  consult  with  each 
other  Is  usually  not  sufficient. 


183 


H.  Training  of  Field  Test  Eviluators 


X-H  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


An  extended  discussion  of  the  training  of  raters  and  other  test  evalua¬ 
tors  Is  not  undertaken  In  of  this  aanual.  The  following  suggestions, 
however,  can  be  offered  about  the  general  training  of  the  A^  field 
test  evaluators.  See  Section  X-B  regarding  questionnaire  administra¬ 
tion  Instructions. 

1.  Impress  on  test  evaluators  that  they  are  supposed  to  answer  the 
questionnaire  based  upon  what  they  observe  in  tiie  test.  Stress  the 
need  for  evaluations  based  only  upon  what  was  seen  during  the  test 
exercise,  regardless  of  any  personal  feelings  or  knowledge  of 
concepts  or  equipment  as  might  exist  In  a  true  combat  environment 
(except  In  special  Instances  where  this  Is  specifically  asked  for). 
To  help  Identify  and  reduce  prejudgment,  a  broad  question  might  be 
Included  to  permit  the  evaluators  to  express  any  biases  they  may 
have.  It  may  be  a  question  such  as  "Based  on  your  personal  experi¬ 
ence,  do  you  feel  the  "OPST"  Is  a  useful  approach  to  real  dally 
problems,  I.e.,  outside  a  test  exercise  environment?”  Such  a 
question  would  permit  the  evaluators  an  outlet  for  preconceived 
opinions  and  attitudes  which  otherwise  would  color  their  view  of 
the  events  observed  during  the  exercise.  On  the  other  hand.  In 
some  situations  the  evaluators  might  feel  It  necessary  to  defend 
their  personal  judgment  by  biasing  their  answers  to  the  remaining 
question  answers! 

•  • 

2.  Stress  the  Importance  of  evaluators  to  the  success  of  the  test. 
Perhaps  briefly  Indicate  some  actions  which  have  been  taken  to 
Implement  concepts  supported  by  evaluative  data  from  previous 
tests. 

3.  Permit  evaluators  (particularly  after  the  pilot  test)  to  sound  off 
about  the  forms  and  their  perceived  Inadequacies,  regardless  of  how 
unreasonable  these  complaints  might  be.  The  goal  Is  to  have  all 
evaluators  answering  questionnaires  understand  that  they  are  active 
and  Important  contributors  rather  than  just  a  means  of  satisfying 
some  obscure  test  requirement. 

4.  Examine  all  tumed-ln  questionnaires  to  ensure  that  they  have  been 
filled  out  and  understood.  This  procedure  should  continue  through¬ 
out  the  entire  series  of  tests. 

5.  Stress  the  notion  that  complete  honesty  end  objectivity  Is  needed. 
Sometimes  evaluators  try  to  please  the  test  sponsors,  to  the  detri¬ 
ment  of  the  test. 


189 


.  X-H  Page  2 

8  Mar  85 
(s.  1  Jul  76) 

6.  Indicate  to  evaluators,  perhaps  on  the  top  of  all  questionnaires  or 
verbally,  that  they  aay  nake  marginal  note  clarifications  concern¬ 
ing  their  scale  value  selection  for  any  rating  question.  This  will 
Increase  posttest  accuracy  In  determining  questions  which  are 
scaled  awkwardly  or  unclearly  stated.  This  Is  particularly  crucial 
during  the  pretesting  or  pilot  test.  Notes  should  be  made  regard¬ 
ing  question  structure  Imedlately  as  they  occur  to  the  evaluator 
or  the  difficulty  Is  likely  to  be  forgotten. 

7.  Prior  to  having  the  evaluators  complete  questionnaires,  ask  all  or 
a  sample  of  randomly  selected  evaluators  to  orally  describe  to  the 
other  evaluators  what  they  believe  each  question  Is  asking.  This 
procedure  will  reduce  differences  between  judges  because  of  varying 
semantic  Interpretations.  By  the  time  of  the  actual  exercise,  all 
evaluators  should  generally  agree,  for  example,  on  the  meaning  of 
"coonand  and  control  effectiveness,*  *f1re  power  potential,”  etc. 

If  this  Is  done,  the  criteria  will  have  mutual  acceptance.  This 
procedure  Is  also  useful  during  the  pretest  to  assist  In  the  selec¬ 
tion  of  Item  wording  that  will  be  understood  by  the  respondents. 

8.  Evaluators  should  be  forewarned  about  biases  such  as  the  halo 
effect,  central  tendency,  and  others  discussed  In  Chapter  XII.  If 
It  Is  explained  to  the  evaluator  that  these  are  common  biases  to 
which  we  are  all  subject,  the  evaluators  will  be  better  able  to 
consider  the  fairness  and  accuracy  of  their  observations.  Training 
to  reduce  rating  errors  Is  especially  effective  when  the  training 
is  extensive,  and  allows  evaluators  to  practice.  Evaluator  experi¬ 
ence  with  the  questionnaire  may  Improve  rating  accuracy.  Short 

^.training  programs  may  have  little  Impact  on  rating  quality.  To 
t'‘h1n  evaluators,  effective  training  should  Include  observational 
techniques  in  conjunction  with  written  performance  observations 
between  rating  periods. 

9.  The  Independent,  non-col laborative,  evaluation  of  each  question 
should  be  stressed. 


190 


X-I  Page  1 
1  Jul  76 


I.  Other  Factors  Related  to  Questionnaire  Administration 

Some  other  factors  related  to  questionnaire  administration  that  have 
not  been  discussed  In  other  sections  of  this  manual  are  addressed 
below: 

1.  Respondents  nay  at  tines  be  Influenced  by  the  title  of  the  ques¬ 
tionnaire.  The  word  "test*  should  not  be  used  In  a  title  of  a 
questionnaire  as  It  may  Inply  that  It  Is  a  test  of  the  respondent's 
knowledge. 

2.  A  problem  with  Amy  field  test  evaluations  concerns  undue  Influence 
by  the  questionnaire  administrator.  It  Is  sometimes  necessary  to 
use  line  officers  from  the  units  of  the  test  subjects  as  question¬ 
naire  administrators.  When  outside  administrators  are  used,  they 
must  be  carefully  Instructed  to  make  no  comments  whatsoever  regard¬ 
ing  their  personal  opinions  of  the  Items  being  evaluated.  An  off¬ 
hand  comment  by  a  company  commander  administrator  to  his/her  com¬ 
pany  regarding  the  "goodness*  or  "badness"  of  a  piece  of  equipment 
or  concept  being  evaluated  can  exert  an  Influence  sufficient  to 
distort  the  results  significantly  from  what  they  would  otherwise 
have  been. 

3.  The  manner  In  which  test  subjects  are  selected  and  utilized  In 
operational  tests  may  affect  the  manner  In  which  they  respond  to 
questionnaire  Items.  For  example,  separate  groups  with  no  prior 
experience  with  either  the  test  system  or  the  current  standard 
system  could  evaluate  each  system.  This  would  exclude  pretest 
biases,  but  test  subjects  would  have  no  basis  to  compare  the  two 
systems.  Alternatively,  the  same  group  of  test  subjects  could  use 
both  systems  In  rotation.  However,  this  procedure  may  result  In  a 
bias  for  or  against  one  or  both  systems  as  a  function  of  which  was 
used  first.  In  this  respect  too,  personnel  having  extensive  prior 
experience  with  a  current  standard  system  may  Introduce  their  pre¬ 
test  biases  for  or  against  that  system  when  It  Is  being  evaluated 
against  a  candidate  replacement  system.  The  consequence  of  such 
considerations  Is  that  the  type  of  systerr  evaluation  Intended  will 
govern  the  way  evaluators  and/or  test  subjects  are  selected  and 
utilized.  The  methods  of  selection  and  utilization  will  Influence 
the  way  questionnaires  must  be  designed,  and  In  turn  suggest  the 
types  of  problems  likely  to  arise. 


191 


XI'A  Page  1 
1  Jul  76 


Chapter  XI;  Pretesting  of  Questionnaires 

A.  Overview 

Even  the  most  careful  screening  of  a  questionnaire  by  Its  developer  or 
by  questionnaire  construction  experts  will  usually  not  reveal  alVof 
Its  faults.  Pretesting  Is  an  Important  and  essential  procedure  to 
follow  before^adm^nlsterlng  any  Questionnaire.  Its  purpose  ^s.  of 
course,  to  ^ind  those  overlooked  probtems  and  faults  that  would  other¬ 
wise  reduce  the  validity  of  the  Infomatlon  obtained  from  the  ques¬ 
tionnaire  responses.  However,  Just  any  pretest  will  not  do.  One  must 
know  how  to  pretest  the  Items  and  what  to  look  for. 

Some  guidelines  for  pretesting  questionnaires  are  given  In  this  chap¬ 
ter.  Pretesting  may  seem  to  some  uninformed  Individuals  to  be  a  waste 
of  time,  especially  when  the  author  may  have  asked  several  people  In 
his/her  own  office  to  critique  the  questions,  or  perhaps  even  asked  a 
questionnaire  specialist  to  critique  It.  However,  pretesting  Is  an 
Investment  that  Is  well  worthwhile.  It  Is  crucial  if  the  decision  that 
will  result  from  the  questionnaire  Is  of  any  Importance. 


193 


Xl-B  Page  1 
8  Mar  85 
(s.  l.Ju!  76) 


B.  6u1deHnes  for  Pretesting  Questionnaires 

1.  Before  a  pretest  Is  conducted  and  a  questionnaire  Is  constructed, 
hypotheses  and  questionnaire  Items  are  developed.  The  hypotheses 
•re  presented  to  a  group  of  Individuals  Mho  are  subject  matter 
experts.  The  group  performs  a  preliminary  assessment  of  the  hy¬ 
potheses  and  Items.  Modification  may  be  required  regarding  the 
hypotheses  and/or  questionnaire  Items. 

2.  Initially,  open-ended  questions  are  established  and  placed  Into  a 
logical  sequence.  Pretesting  may  provide  Information  that  can  be 
used  to  convert  open-ended  questions  to  multiple  choice  questions 
to  facilitate  data  reduction  and  analysis.  Instructions  are  devel¬ 
oped  to  accompany  the  questionnaire,  and  they  are  Included  as  part 
of  the  pretest.  If  branching  Is  used.  It  should  be  kept  to  a 
minimum. 

3.  It  Is  Important  that  the  respondents  employed  In  pretesting  be 
representative  of  the  eventual  target  respondents.  For  example.  If 
Infantry  enlisted  men  Mill  perform  In  a  test  and  then  take  the 
questionnaire.  It  should  not  be  pretested  Mith  respondents  Mho  are 
armored  officers;  even  Infantry  officers  Mould  not  be  satisfactory. 

4.  The  pretest  Is  more  useful  If  It  Is  conducted  by  someone  Mho  knoMS 
the  operations  to  be  performed  In  the  test  and  Mho  also  knoMS  the 
subject  matter  that  the  questionnaire  covers.  It  Is  best  If  the 
question  Mrlter  Is  knoMledgeable  about  these  operations  and  con¬ 
ducts  the  pretest. 

5.  Early  versions  of  questionnaires  may  contain  Instructions,  Item 
stems,  response  alternatives,  and  Item  ordering  that  are  confusing 
to  respondents.  It  Is  possible  that  more  than  one  pretest  Mill 
need  to  be  conducted.  Some  researchers  have  been  knoMn  to  conduct 
up  to  six  or  more  pretests. 

6.  IntervleM  and  pretest  some  of  the  pretest  respondents  one  at  a 
time  or  In  a  group.  Ask  each  respondent  to  read  each  question  and 
explain  Its  meaning.  Also  ask  them  to  explain  the  meaning  of  the 
response  alternatives,  and  to  make  their  choice.  Ask  the  respon¬ 
dent  to  explain  Mhy  a  particular  choice  Mas  made.  The  respondents' 
ansMers  Mill  frequently  reveal  Incorrect  ass’nptlons  and  possible 
rationales  that  the  question  Mrlter  never  dreamed  possible.  They 
Mill  also  help  to  Identify  lack  of  understanding  of  particular 
Mords,  vague  or  ambiguous  phrases.  111  defined  or  loaded  questions, 
Ipadequate  space  for  recording  ansMers,  Inappropriate  sequencing  of 
Items,  etc. 


1S4 


XI-B  Page  2 
8  Mar  8S 
(s.  1  Jul  76) 


7.  One  good  technique  for  pretesting  Is  to  have  respondents  complete 
the  questionnaire.  A  discussion  can  then  be  had  where  respondents 
read  each  question  aloud  and  then  tell  you  what  It  means.  Any 
difficulties  at  all  should  be  a  cause  for  concern  and  revision. 
Pretest  methodology  can  be  strengthened  If  discussions  are  tape 
recorded,  and  suggestions  for  modifications  are  systematically 
coded.  This  Is  especially  useful  when  pretesting  a  questionnaire 
that  will  be  administered  by  Interview. 

8.  During  pretesting,  the  respondents  should  be  encouraged  to  make 
marginal  notes  on  the  questionnaire  regarding  sentence  structure, 
unclear  questions  or  statements,  etc.  Pretests  will  provide  a  good 
idea  as  to  the  length  of  time  It  takes  tJ  complete  the  question¬ 
naire. 

9.  When  attitude  questions,  especially,  are  being  pretested,  indivi¬ 
duals  who  may  hold  minority  views  should  be  Included.  This  will 
help  Identify  loaded  questions. 

10.  Pretests  for  the  selection  of  verbal  anchors  are  valuable  In  build¬ 
ing  rating  scale  content  validly  and  reliability.  Rather  than 
employing  anchors  which  seem  appropriate,  the  anchors  used  In  the 
final  scales  should  be  seTected  as  a  result  of  analyses  of  pretests 
of  respondents  similar  to  those  who  will  be  participating  In  the 
final  test. 

11.  While  pretesting  a  questionnaire,  a  high  proportion  of  respondents 
giving  no  response  or  a  "Don't  know"  response  should  be  a  cause  for 

— concern.  However,  a  low  number  of  "Don't  know"  responses  (espe¬ 
cially  for  multiple  choice  Items)  does  not  guarantee  that  the 
question  Is  good. 

12.  After  pretesting,  each  question  should  be  reviewed  and  Its  Inclu¬ 
sion  In  the  questionnaire  Justified.  Questions  that  do  not  add 
significant  Information  or  that  largely  duplicate  other  questions 
can  profitably  be  eliminated.  Quantitative  Item  reduction  tech¬ 
niques  will  depend  on  the  t/pe  of  scale  that  Is  being  used,  e.g., 
Thurstone,  Likert,  Guttman,  etc.  A  discussion  of  quantitative  Item 
reduction  techniques  Is  outside  the  current  scope  of  this  manual. 
Krmjf  personnel  may  check  with  the  Army  Research  Institute  Field 
Unit  closest  to  them  for  help  In  this  area. 


195//?6 


I 


XII-A  Page  1 
I  Jul  76 


Chapter  XII;  Characteristics  of  Respondents 
that  Influence  Questionnaire  Results 


A.  Overview 

This  chapter  discusses  sone  characteristics  of  respondents  that  Influ¬ 
ence  questionnaire  results.  It  therefore  Identifies  sone  of  the  prin¬ 
cipal  sources  of  error  In  the  reporting  of  observations  and/or  the 
evaluation  of  perfomance  In,  for  example,  operational  Ansy  field 
tests.  Additional  research  Is  required,  however,  to  determine  their 
relative  contributions  to  error  variance. 

Sections  XII-B  and  C  present  a  discussion  of  various  biases,  response 
sets,  or  other  sources  of  error.  There  Is  sone  confusion  In  the  liter¬ 
ature  regarding  the  use  of  these  terns,  but  they  are  similar.  A  bias 
Is:  a  tendency  to  deviate  from  a  true  value;  a  tendency  to  favor  a 
certain  position  or  conclusion;  or  an  attitude  either  for  or  against  a 
certain  unproved  hypothesis  which  prevents  an  Individual  from  evaluat¬ 
ing  the  evidence  correctly.  A  response  set  or  response  bias  refers  to 
the  tendency  of  a  respondent  to  answer  questions  In  a  particular  way 
almost  Independent  of  the  content  of  the  questions.  An  error  Is  simply 
a  mistake  or  departure  from  correctness. 

Section  XII-0  addresses  the  effects  of  attitudes  of  respondents  on 
questionnaire  results,  while  Section  XII-E  considers  the  effects  of 
demographic  characteristics  on  responses. 

One  of  the  main  purposes  of  this  chapter  Is  to  alert  the  questionnaire 
designer  to  some  of  the  characteristics  of  respondents  that  Influence 
questionnaire  results.  There  are  ways  that  some  of  the  biases  and 
errors  can  be  controlled,  but  not  all  of  them.  And  there  appears  to  be 
no  easy  way  of  detecting  the  Influence  of  a  response  set  nor  of  neu¬ 
tralizing  It.  More  detailed  Identification  and  control  methods  are 
areas  of  needed  further  research. 


197 


XII-B  Page  1 
8  Nar  85 
(s.  1  Jul  76) 

8.  Social  Desirability  and  Acquiescence  Response  Sets 

Social  desirability  Is  a  response  set  where  persons  answer  according  to 
the  norms  they  believe  society  supports.  It  Is  the  tendency  to  agree 
with  Items  the  respondents  believe  reflect  socially  desirable  attitudes 
In  order  to  show  themselves  In  a  better  light.  Acquiescence  response 
set  1$  the  tendency  to  consistently  agree,  to  say  "Yes,"  or  to  say 
"True.*  It  Is  a  general  tendency  to  assent  rather  than  dissent.  A1* 
though  there  have  been  a  number  of  studies  about  each,  a  detailed  d1s~ 
cusslon  of  them  Is  beyond  the  scope  of  this  manual.  (See  P-77-2,  Ques¬ 
tionnaire  Contructlon  Manual  Annex,  Literature  Survey  and  Bibliography; 
and  P-85-J,  Questionnaires:  Literature  Survey  and  Bibliography.)  Some 
comments  about  each  are  presented  below. 

1.  Social  Desirability  Response  Set 

a.  Social  desirability  response  set  seems  to  operate  whenever  the 
respondent  has  the  opportunity  to  respond  In  terms  of  It.  Some 
believe  that  Its  effect  Is  so  powerful  that  respondents  would 
not  tend  to  deviate  from  social  norms  In  their  answers  even 
though  their  behavior  denied  what  they  said. 

b.  Several  authors  have  Identified  respondents  with  a  high  social 
desirability  response  rate.  They  found  these  respondents  to 
give  more  true  responses  to  neutral  Items,, to  be  more  suscepti¬ 
ble  to  social  pressures,  to  more  likely  be  Introverts,  and  to 
score  higher  on  a  "lie*  scale. 

c.  Faking  or  responding  with  socially  desirable  answers  which  are 
not  true  Is  part  of  the  response  set. 

d.  Anonymity  falls  to  eliminate  the  social  desirability  response 
set. 

e.  The  forced  choice  Instrtsnent  format  has  been  studied  for  Its 
susceptibility  to  social  desirability  response  set.  a  factor  It 
was  Intended  to  control.  Some  authors  found  the  forced  choice 
method  minimized  the  effects  of  social  desirability,  while 
others  think  the  factor  still  needs  additional  control.  One 
study  concludes  that  In  comparing  different  forced  choice  for¬ 
mats,  ambiguous  Items  tend  to  be  freer  of  social  desirability 
response  set  than  positively  or  negatively  worded  Items.  In 
any  case,  the  evidence  Indicates  that  the  social  desirability 
problem  Is  sometimes  less  In  forced  choice  formats  than  in 
other  Item  types  such  as  graphic  rating  scales.  Forced  choice 
formats  may  or  may  not  reduce  bias. 

f.  Card  sorts  also  need  control  to  eliminate  social  desirability 
bias. 


198 


XIl-B  Page  2 
8  Mar  85 
(s.  I  Jul  76) 


g.  Respondents  nay  be  confounding  trait  dlaensloos  with  response 
alternatives  on  clinical  Instrtssents.  There  Is  sone  evidence 
that  responoents  have  a  stronger  tendency  to  select  response 
alternatives  opposite  In  desirability  when  a  socially  undesira** 
ble  response  alternative  Is  presented  first. 

h.  Procedures  have  been  developed  for  controlling  or  balancing . 
social  desirability  by  using  loaded  Iteas  In  the  questionnaire 
and  then  adjusting  the  respondent's  score.  The  social  desira¬ 
bility  score  fron  the  loaded  Iteas  can  also  be  correlated  with 
each  of  the  other  Iteas  on  the  questionnaire.  The  responses  on 
those  iteas  with  a  statistically  significant  correlation  can 
then  be  corrected  by  aoving  the  response  one  or  wore  steps  froa 
the  socially  desirable  response  to  give  a  aore  accurate  result. 

2.  Acquiescence  Response  Set 

a.  The  acquiescence  response  set  Is  defined  as  a  behavioral  atti-  . 
tude  by  the  respondents  to  agree  and  accept,  even  If  they  aust 
alter  their  original  opinions  to  do  so. 

b.  The  acquiescence  response  set  seeas  to  operate  especially  when 
stateaents  are  In  the  fora  of  plausible  generalities. 

c.  The  response  set  may  occur  aore  with  difficult  than  with  easy 
questionnaire  aaterlal. 

d.  Acquiescence  response  set  aay  be  a  personality  trait. 

e.  There  Is  a  concern  that  social  desriablllty  and  acquiescence 
response  sets  aay  be  related  In  such  a  way  that  an  Individual 
with  a  tendency  towar  \  conforalty  will  consistently  reflect 
both  biases. 

f.  Controls  for  acquiescence  response  set  have  been  researched. 
Stating  the  question  stea  In  a  neutral  manner  aay  help  alnlalze 
acquiescence.  The  effects  of  acquiescence  response  set  aay 
also  be  partially  controlled  by  using  two  alternate  question¬ 
naire  foras  with  the  question  stated  positively  on  half  of  the 
forms  and  stated  negatively  on  the  other  half.  The  balancing 
of  scales  (e.g.,  equal  number  of  positive  and  negative  points) 
aay  also  be  of  value  In  counteracting  acquiescence. 


199 


XII-C  Page  1 
8  Har  8S 
(s.  1  Jul  76) 


C. 


This  section  notes  a  number  of  other  response  sets  or  errors  of  which 
the  questionnaire  developer  should  be  aware. 


Error  of  Central  Tendenc? 


Some  respondents  tend  to  avoid  endpoints  on  a  scale,  and  pick  a 
middle  value  regardless  of  their  true  feelings,  it  my  be  more 
common  when  the  respondents  are  not  very  familiar  with  whatever 
they  are  being  asked  to  rate.  It  may  be  counteracted  by  adjusting 
the  strength  of  the  response  alternatives  so  that  there  are  greater 
differences  In  meaning  between  alternatives  near  the  ends  of  the 
scale  than  between  alternatives  near  the  center. 


a.  In  one  study,  responses  tended  to  be  toward  the  center  of  the 
scale  when  Item  length  Increased  (more  than  17  words).  Respon> 
dents  selected  response  alternatives  toward  the  positive  end  of 
the  scale  when  Item  length  was  short  (less  than  17  words). 

Items  may  be  ambiguous  to  the  respondent  when  they  are  long  and 
negatively  worded.  This  appeared  to  Influence  respondents  to 
rate  Items  toward  the  mid-range  of  the  scale. 


Extreme  Response  Set 


On  the  other  hand,  some  Individuals  tend  to  consistently  select 
exaggerated  choices  for  positions.  It  can  be  recognized  when  a 
respondent  makes  a  pattern  of  answers  which  tend  to  be  unevenly 
distributed  toward  one  or  both  ends  of  a  scale.  Research  Indicates 
that  this  response  set  may  be  a  personality  characteristic. 


a.  Research  evidence  Indicates  that  positively  wo^nJed  Items  re¬ 
ceive  higher  mean  responses  than  negatively  worded  Items. 

There  Is  the  possibility  that  respondents  prefer  or  agree  with 
positively  worded  Items,  and  rate  them  higher. 


b.  For  cross-cultural  survey  research,  there  Is  some  evidence  that 
response  style  may  vary  from  country  to  country.  One  study 
concluded  that  there  was  a  tendency  by  respondents  In  the 
Philippines  to  use  a  positive  response  style,  and  by  respon¬ 
dents  In  Italy  to  use  a  negative  response  style. 


3.  Halo  Effect 


XII-C  Page  2 
S  Mar  85 
(s.  1  Jul  76) 


Halo  effect  was  originally  defined  as  a  tendency  when  one  Is  esti* 
■ating  or  rating  a  person  with  respect  to  a  given  trait,  to  be 
Influenced  by  some  other  trait  or  by  one's  general  Impression  of 
the  person.  It  Is,  however,  also  applicable  to  ratings  of  other 
than  people.  For  example.  If  field  test  evaluators  know  that  a 
particular  weapon  system  did  well  In  one  phase  of  a  test,  they  may 
be  Influenced  to  give  high  ratings  to  the  system  In  later  test 
phases  *  even  when  the  system  performs  poorly. 

a.  Host  studies  of  ways  to  control  halo  effect  have  dealt  with 
ratings  of  traits  of  personnel  by  other  personnel,  a  matter  not 
of  great  concern  In  this  manual.  The  forced  choice  technique 
and  Nixed  Standard  Scales  minimize  halo  effect  In  some  situa¬ 
tions.  Ratings  will  also  be  less  distorted  If  questionnaire 
Items  are  constructed  so  as  to  relate  to  clearly  observable 
aspects  of  behavior  which  do  not  overlap.  It  Is  doubtful  that 
the  Influence  of  halo  effects  can  be  completely  eliminated  from 
the  responses  to  any  questionnaire. 

b.  Behavioral  scales  such  as  Behavlorally  Anchored  Rating  Scales 
(BARS),  Behavioral  Expectation  Scales  (BES),  and  Nixed  Standard 
Scales  (MSS)  have  been  developed  to  measure  performance.  There 
Is  evidence  that  the  use  of  behavioral  scales  In  conjunction 
with  Intensive  training  can  reduce  halo  error.  This  combina¬ 
tion  of  behavioral  scale  and  training  appears  to  be  more  effec¬ 
tive  than  graphic  rating  scales,  trait  scales,  and  Likert 
scales  In  reducing  halo  error.  The  length  of  the  training 
session  appears  to  Influence  whether  halo  error  will  be  re¬ 
duced.  Training  sessions  of  5-m1nute  duration  have  had  little 
Impact  on  the  quality  of  ratings.  Training  sessions  of  3-hour 
duration  were  found  to  reduce  halo  error.  Intensive  training 
sessions  may  not  reduce  other  types  of  rating  errors  even 
though  they  tend  to  reduce  halo  error. 

4.  Leniency  Error 

Leniency  error  refers  to  a  general,  constant  tendency  for  a  rater 
to  rate  either  too  high  or  too  low  In  the  direction  of  being  too 
generous.  It  appears  similar  to  halo  effect,  except  that  It  Is 
Independent  of  the  trait  or  factor  being  rated.  Some  raters  have 
an  opposite  tendency  to  rate  too  severely.  In  one  study,  respon¬ 
dents  rated  Likert  scales  with  less  leniency  error  than  they  rated 
behavioral  scales.  These  findings  may  not  be  consistent  across 
studies.  In  large  groups  of  raters,  the  opposite  tendencies  should 
balance  out. 


201 


XII-C  Page  3 
8  Har  85 
(s.  1  Jul  76) 


5.  Logical  Error 

Logical  error  Is  also  similar  to  halo  effect.  It  is  due  to  the 
fact  that  raters  are  likely  to  give  similar  ratings  to  traits  or 
items  that  seem  logically  related.  For  example,  field  test  evalu* 
ators  may  know  that  a  counterattack  was  extremely  successful ;  they 
may  therefore  reason  that  command  and  control  was  also  very  effec¬ 
tive  and  should  receive  an  equivalent  high  evaluation  because  a 
successful  counterattack  is  a  function  of  good  command  and  control. 
Such  reasoning  assumes  a  dependence  which  may  or  may  not  be  true. 
Logical  error  may  be  avoided  in  part  by  asking  for  judgments  of 
objectively  observable  actions  or  behavior. 

6.  Proximity  Error 

Proximity  error  occurs  when,  due  to  the  ordering  of  questionnaire 
items,  the  answer  to  one  item  results  in  an  answer  to  a  subsequent 
question  being  substantially  changed  from  what  it  would  otherwise 
have  been.  Little  is  known  about  its  influence  in  field  test 
situations;  most  research  in  this  area  has  concerned  the  rating  of 
personality  trait  variables. 

7.  Contrast  Error 


Contrast  error  refers  to  a  tendency  of  raters  to  rate  others  in  the 
opposite  direction  from  themselves  in  regard  to  a  trait.  Little 
research  has  been  done  on  this  source  of  error. 

8.  Feedback  Bias 


Research  shows  that  if  observers  are  informed  of  experimental 
hypotheses,  and  if  they  receive  daily  feedback  indicating  how  well 
their  data  support  the  hypotheses,  they  will  tend  to  report  data 
supporting  those  hypotheses  -  even  when  the  reverse  is  true!  This 
bias  does  not  seem  to  occur,  however,  when  observers  are  Informed 
only  of  the  experimental  hypotheses  with  no  follow-up.  Taking 
precautions  to  assure  high  levels  of  observer  accuracy  minimizes 
the  bias. 


202 


D.  Effects  of  General  Attitudes  of  Respondents 


XII-0  Page  1 
8  Har  85 
(s.  1  Jul  76} 


Limited  research  has  been  conducted  upon  how  the.  attitudes  of  a  respon¬ 
dent  Influence  questionnaire  results.  The  following,  however,  should 

be  noted: 

1.  Respondents  at  times  base  their  ratings  not  on  what  Is  observed  but 
on  what  they  believed  prior  to  the  observation.  Beliefs  and  opin¬ 
ions  may  affect  results. 

2.  It  Is  generally  believed  that  judges  used  as  part  of  the  process  of 
determining  scale  values  can  rate  Items  without  being  Influenced  by 
their  own  attitudes.  There  Is  also  some  evidence  to  the  contrary. 

3.  Unstable  or  changing  responses  to  questionnaires  may  be  caused  by 
shifts  In  the  mood  of  the  respondent,  relative  values  among  the 
possible  choices,  and  the  degree  of  Interest  present  In  the  ques¬ 
tion. 

4.  As  questions  become  more  ambiguous,  responses  normally  become  more 
Influenced  by  attitudes. 

5.  It  may  be  desirable  to  revise  a  questionnaire  when  norms  of  groups 
differ  greatly  from  those  with  whom  the  questionnaire  was  pretested 
or  previously  administered. 


203 


XII-E  Page  1 
8  Nar  85 
(s.  1  Jul  76) 


Effects  of  Demographic  Characteristics  on  Responses 


Demographic  characteristics  have  been  shotm  to  Influence  questionnaire 
results.  Similarities  of  such  variables  among  respondents  often  tend 
to  be  related  to  a  response  pattern.  These  variables  Include:  age, 
religion,  sex.  Intelligence,  marital  status,  parenthood,  socioeconomic 
class,  nationality,  race,  urban  or  rural  residence.  Income,  rank  and 
experience.  Questionnaires  should,  therefore,  be  designed  with  the 
respondents  background  In  mind.  When  ttere  Is  a  suspicion  that  demo¬ 
graphic  characteristics  may  affect  response,  the  data  should  be  ana¬ 
lyzed  by  type  of  respondent. 


1.  Research  Indicates  that  the  racial  background  of  survey  Interview¬ 
ers  does  not  seem  to  affect  survey  results  when  the  questions  do 
not  deal  with  racial  stereotypes  and  are  not  threatening.  For  most 
questionnaires  administered  by  Interview,  It  would  be  possible  to 
assign  Interviewers  of  different  racial  backgrounds  regardless  of 
respondents*  racial  backgrounds. 


2.  Racial  background  has  been  known  to  Influence  rating  errors  on  per¬ 
formance  measures.  However,  this  phenomenon  has  not  been  observed 
consistently. 


3.  Survey  Items  which  tend  to  be  most  sensitive  to  differences  In 
response  pattern  by  gender  are  those  dealing  with  sex  role  stereo¬ 
types.  Items  that  are  relevant  to  technical  background  experience 
which  females  may  not  have,  may  yield  gender-related  response 
patterns. 


4.  It  was  hypothesized  In  one  study  that  females  would  rate  Items 
according  to  social  desirability.  It  was  suggested  that  females 
have  a  greater  need  for  social  approval,  that  they  are  more  Impul¬ 
sive  than  males,  and  that  this  would  be  reflected  by  male/female 
differences  In  rating.  The  results  indicated  that  there  were  no 
significant  differences  between  ratings  by  gender.  Some  studies 
have  found  differences]  In  rating  by  ftmales,  while  other  studies 
have  not.  The  overall  effect  of  gender  differences  In  response 
pattern  has  little  supiport.  Rating  characteristics  Identified  by 
gender  are  usually  not  enough  to  e;.p1a1n  rating  differences.  Other 
variables  must  be  taken  Into  account  as  well,  such  as  education, 
race,  age,  etc. 

5.  Respondents  with  low  levels  of  education  may  be  confused  by  Items 
constructed  In  absolute  terms.  This  may  result  In  rating  the  Items 
with  an  Inconsistent  response  pattern.  One  should  review  question¬ 
naire  Items  to  ensure  that  the  content  Is  not  ambiguous  to  respon¬ 
dents  with  a  low  level  of  education. 


204 


XII-E  Page  2 
8  Mar  85 


6.  Survey  Iteas  which  request  an  opinion  regarding  an  obscure  topic 
area  aay  elicit  a  "Don't  know*  response  by  respontents  with  higher 
levels  of  education.  Respondents  with  higher  levels  of  education 
seea  to  be  wore  willing  to  admit  they  do  not  have  knowledge  of 
obscure  topics.  For  this  type  of  question,  respondents  with  less 
education  have  a  tendency  to  give  an  opinion.  Respondents  with  low 
levels  of  education  do  not  appear  to  admit  they  don't  know,  but 
Instead  select  a  response  alternative  to  represent  their  opinion. 

7.  For  questions  which  are  not  obscure,  the  "Don't  know*  response 
alternative  aay  be  selected  aost  frequently  by  respondents  who  have 
the  least  amount  of  education.  Individuals  with  less  education 
appear  to  be  the  aost  Influenced  when  a  "Don't  know*  response 
alternative  Is  Included. 

8.  The  age  of  respondents  does  not  appear  to  Influence  their  ability 
to  use  different  types  of  rating  scales.  However,  the  educational 
level  of  respondents  aay  affect  the  way  in  which  different  scales 
are  rated. 

9.  The  content  of  some  Items  aay  be  related  to  the  historical  perspec¬ 
tive  of  different  age  groups.  For  such  Items,  the  responses  aay  be 
associated  with  different  response  patterns  according  to  the  age  of 
respondents. 

10.  Nonresponse  to  an  entire  survey  or  to  specific  Items  In  a  survey 
remains  a  threat  to  the  validity  of  survey  results.  Research  Indi¬ 
cates  that  nonresponse  rates  are  sometimes  associated  with  age  of 
respondents.  Item  nonresponse  rate  aay  be  reduced  by  eliminating 
branching  from  surveys  that  Include  respondents  approximately  60 
years  and  older. 

11.  When  surveys  are  conducted  by  Interview.  It  Is  Important  to  be  sure 
that  older  Interviewers  (about  50  or  over)  are  following  the  stan¬ 
dard  format  and  Interview  guide.  Of  course,  all  Interviewers  need 
to  conduct  standardized  Interviews.  One  study  Indicated  that  older 
Interviewers  aade  aore  errors  by  conducting  the  Interview  In  a  non- 
standardlzed  way.  Possibly,  the  Interviewers  felt  that  their  years 
of  experience  afforded  them  the  opportunity  to  probe  questions  more 
thoroughly,  and  to  somewhat  modify  the  Interview  guide  as  they 
progressed  through  the  Interview. 


zoslz^4 


XIII'A  Page  1 
8  Mar  85 
(s.  1  Jul  76) 


Chapter  XIII;  Evaluating  Questionnaire  Results 


A.  Overview 


An  extended  discussion  on  evaluating  questionnaire  results  Is  currently 
outside  the  scope  of  this  aanual  on  questionnaire  development.  How> 
ever.  Army  personnel  auy  check  with  the  Army  Research  Inst1tute>F1e1d 
Unit  closest  to  them  for  help  In  the  areas  of  coding  and  data  analyses. 
There  are  some  factors  relating  to  the  evaluation  of  questionnaire  re¬ 
sults  that  should  be  noted  since  they  may  Influence  how  questionnaires 
are  designed  and  developed.  Section  XlII-8  considers  the  scoring  and 
coding  of  questionnaire  responses,  and  Section  XIII-C  contains  some 
notes  about  data  analyses. 


207 


/ 


XIII'B  Page  1 
8  Mar  85 
1  Jul  76 


8.  Scoring  Questionnaire  Responses 
1.  Practical  Considerations 

a.  8oth  tine  and  aoney  can  be  saved  by  planning  the  questionnaire 
in  line  with  scoring  and  tabulation  requirenents.  The  phrasing 
of  questions  and  their  sequencing  and  layout  affect  tabulation 
tine.  For  example.  It  Is  advantageous  to  have  data  coded  and 
entered  for  analysis  directly  from  edited  questionnaires. 
Questionnaires  consisting  of  only  closed-end  Items  will  have  a 
lower  level  of  error  for  data  entry  than  open-ended  Items. 

This  Is  a  more  cost-effective  approach.  However,  there  are 
some  drawbacks  such  as  greater  difficulty  In  verltying  the 
coding  and  greater  data  entry  time  than  when  using  a  coding 
sheet. 

b.  A  decision  should  be  made  ahead  of  time  regarding  whether  the 
data  will  be  tabulated  by  hand  or  machine. 

c.  Response  alternatives  should  be  precoded  whenever  possible. 
Codes  for  open-ended  Items  are  more  difficult  to  construct  than 
codes  for  closed-end  Items.  To  develop  open-ended  Item  codes, 
list  out  possible  responses  to  the  Item.  Pretest  the  question¬ 
naire  to  classify  responses  to  open-ended  Items.  Construct  a 
classification  system  and  code.  Pretest  the  code  and  revise  as 
necessary.  Develop  a  separate  code  for  responses  that  were  not 
possible  to  fit  Into  the  classification  system  above. 

d.  Codes  need  to  be  developed  which  guide  coders  In  assigning  code 
ntsnbers  to  each  answer.  This  Includes  the  following:  codes 
for  missing  data  for  Item  nonresponse,  codes  for  Item  responses 
that  are  uncodable  due  to  poor  respondent  performance,  and  a 
code  for  the  "Don't  know*  response  alternative. 

e.  Code  books  are  constructed  to  define,  clarify,  and  amend  codes 
used  during  the  coding  process.  Codes  that  have  caused  diffi¬ 
culty  for  the  coders  should  be  noted,  such  as  classification 
systems  and  codes  for  open-ended  Items.  Coders  require  train¬ 
ing  on  specifics  of  the  classification  system  and  codes  used 
for  the  study,  and  for  the  general  principles  of  coding. 

f.  Since  It  does  not  seem  to  matter  If  Items  are  scrambled  or  In 
blocks  according  to  content,  blocking  may  be  preferred  due  to 
greater  hand  scoring  ease. 

g.  Telephone  surveys  now  use  Computer  Assisted  Telephone  Inter¬ 
viewing  (CATI).  These  systems  are  still  In  experimental 
stages,  and  they  require  extensive  programing.  Items  are  read 
off  the  CRT  screen,  and  telephone  Interviewers  type  respondent 
answers  Into  a  terminal  for  direct  data  entry. 

h.  See  Section  IX-E  regarding  the  use  of  answer  sheets. 


208 


XIII-B  Page  2 
8  Har  85 
{s.  1  Jul  76) 


2.  Other  Considerations 


a.  There  taay  be  a  justification  for  scoring  rating  scale  Itens 
dlchotonously  according  to  the  direction  of  response.  It  Is 
sometlnes  done  when  bipolar  scales  are  analyzed  In  terns  of  the 
proportion  of  responses  In  either  direction  of  the  basic  dicho- 
toRV*  The  justification  Is  based  upon  results  that  seem  to 
Indicate  that  conposlte  scores  reflect  prinarlly  the  direction 
of  responses  and  only  to  a  minor  extent  their  Intensities. 

b.  One  Investigator  found  that  many  Ulcert*type  rating  scales 
consisting  of  2  through  19  steps  may  be  collapsed  Into  two  or 
three  measurement  categories  for  analysis  with  no  lack  of 
precision. 

c.  When  working  with  paired  comparison  Items  with  a  "No  prefer* 
ence"  option,  the  ‘‘No  preference"  responses  can  often  be  either 
divided  proportionate  to  the  preference  responses,  or  disre¬ 
garded  altogether.  The  basis  for  this  suggestion  Is  that 
respondents  who  claim  neutrality  appear  to  exhibit  the  same 
preference  patterns  as  those  who  express  a  preference. 

d.  8y  using  any  one  of  several  methods  of  scoring  or  transforming 
se1f*rat1ng  scale  raw  scores.  It  Is  usually  possible  to  approx¬ 
imate  dichotomous  forced  choice  results  with  considerable 
saving  In  administration  time,  and  a  small  gain  In  test-retest 
reliability. 

e.  Investigators  sometimes  use  Intensity  scores  as  well  as  rating 
scale  content  scores.  One  way  of  obtaining  an  Intensity  score 
Is  to  follow  each  question  with  the  query,  "How  strongly  do  you 
feel  about  this?"  A  second  way  Involves  weighting  extreme 
responses  (positive  and  negative)  as  2,  moderate  responses  as 
I,  and  neutral  responses  as  0.  These  weights  can  then  be 
summed  for  an  Intensity  score. 


209 


XII I-C  Page- 1 
8  Mar  85 
(s.  1  Jul  76) 


C.  Data  Analyses 

A  detailed  discussion  of  data  analyses  Is  beyond  the  scope  of  this 
■anual;  hOMever,  some  basic  data  analysis  Issues  have  been  mentioned  In 
related  chapters.  Additionally,  the  following  points  are  also  noted: 

1.  Analyses  of  questionnaire  responses  are  chiefly  of  two  types: 
summary  tabulations  and  statistical  analyses.  Tabulations  are  used 
primarily  for  the  presentation  of  results.  Statistical  tests  are 
used  to  determine  whether  the  differences  In  the  results  are  sig¬ 
nificant.  Statistical  literature  Is  available  which  presents 
numerous  tests  usable  In  such  analyses. 

2.  As  part  of  the  questionnaire  development  process,  tentative  (dunmy) 
analysis  tables  should  be  developed  to  assure  that  the  data  to  be 
obtained  are  appropriate. 

3.  Heights  can  be  assigned  to  questionnaires  when  there  Is  a  proba¬ 
bility  that  the  selection  of  respondents  Is  not  representative  of 
the  population  as  a  whole.  For  example,  a  sample  distribution 
drawn  from  a  list  of  service  personnel  receiving  training,  and 
enrolled  In  various  courses,  may  result  In  unequal  probability 
sampling.  Since  the  subjects  may  be  enrolled  In  more  than  one 
course,  the  more  courses  they  take,  the  greater  the  chance  they 
will  be  selected  Into  the  sample. 

Heights  are  also  used  In  making  adjustments  for  total  nonresponse 
and  In  poststratification.  They  are  able  to  assign  greater  Impor¬ 
tance  to  some  sampled  elements  than  to  others  In  the  data  analysis. 
Poststratification  conforms  the  sample  distribution  to  the  known 
population  distribution.  The  sample  distribution  Is  adjusted 
across  the  strata.  This  Is  useful  when  the  population  Is  known, 
but  the  stratified  sample  elements  cannot  be  determined  at  the 
selection  stage.  In  such  situations,  prior  stratification  Is  not 
employable,  although  poststratification  may  be  applied  later.  Hhen 
a  sample  Is  weighted  to  a  known  population.  It  will  adjust  for  the 
sampling  fluctuations,  as  well  as  for  nonresponse.  For  example.  If 
nonresponse  Is  higher  for  a  specific  age  group,  the  sample  will 
conform  to  the  known  age  distribution  when  weighted.  The  develoo- 
ment  of  weights  Isa  difficult  task.  Standard  computer  programs 
for  weighted  data  can  be  applied  In  data  analysis. 

4.  Four  kinds  of  measurement  scales  have  been  Identified:  nominal, 
ordinal.  Interval,  and  ratio.  Appropriate  statistical  analyses  are 
associated  with  each.  Hence,  the  data  analysis  limitations  of 
various  forms  of  questionnaires  should  be  considered  before  an  1n- 
struaent  Is  designed.  For  example,  less  can  be  done  statistically 
with  open-ended  questions  than  with  ranking  questions. 


210 


XIV-A  Page  1 
1  Jul  76 


Chapter  XIV;  Interview  Considerations 


A.  Overview 

If  properly  used,  the  Interview  Is  an  effective  oeans  of  obtaining 
data.  It  is  a  technique  In  which  an  Individual  Is  questioned  by  a 
skilled  and  trained  Interviewer  who  records  all  replies,  preferably 
verbatim  In  most  cases.  Host  of  the  principals  of  questionnaire  con* 
structlon  discussed  In  previous  chapters  pertain  to  the  interview  as 
well.  This  chapter,  however,  notes  some  issues  specifically  related  to 
Interviews. 

Section  XIV-B  presents  the  distinction  between  structured  and  unstruc¬ 
tured  Interviews.  Interviewer's  characteristics  relative  to  the  In¬ 
terviewee  are  noted  In  Section  XIV-C.  Situational  factors  are  noted  In 
Section  XIY-0,  while  the  topics  of  Sections  XIY-E,  F,  and  &  are,  re¬ 
spectively,  training  Interviewers,  data  recording  and  reduction,  and 
special  problems.  There  is,  unfortunately,  little  that  can  be  recom¬ 
mended  to  avoid  some  of  the  problems  noted  In  this  chapter.  The  ques¬ 
tionnaire  developer  should.  In  any  case,  be  aware  of  them. 


211 


XIV-B  Page  1 
1  Jul  76 


B.  Structured  and  Unstructured  Interviews 

The  tera  "structured*  when  applied  to  Interviews  Is  Intended  to  enpha- 
size  that  the  Interviewer  enploys  a  script  of  all  the  questions  to  be 
asked.  In  the  unstructured  Interview,  the  Interviewers  aay  know  aany 
of  the  topics  to  be  covered  but  they  need  to  learn  aore  about  the 
subject  overall,  so  they  are  willing  to  be  led  by  the  Interviewee  even 
Into  digressions.  Unstructured  Interviews  aay  occur  as  a  preliminary 
to  preparing  either  a  questionnaire  or  a  structured  Interview  script. 
One  could  use  a  questionnaire  as  the  script  for  a  structured  Interview 
If  one  already  had  the  questionnaire  developed,  but  not  enough  time  to 
convert  It  to  a  aore  convenient  foraat.  The  aaln  difference  between 
the  structured  Interview  and  questionnaire  Is  procedural. 

The  degree  of  proficiency  required  of  Interviewers  In  conducting  an 
unstructured  Interview  Is  generally  not  available  during  Army  field 
test  evaluations.  A  structured  Interview  requires  the  Interviewer  to 
have  only  moderate  skill  and  proficiency,  and  hence  Is  usually  pre¬ 
ferred.  The  advantages  of  the  structured  Interview  Include:  the 
opportunity  to  probe  for  all  the  facts  when  the  respondent  gives  only  a 
partial  or  Incomplete  response:  a  chance  to  ensure  that  the  question  Is 
thoroughly  understood  by  the  respondent;  and  an  opportunity  to  pursue 
other  problem  areas  which  may  arise  during  an  Interview.  The  struc¬ 
tured  Interview  Is  almost  always  preferable  to  a  questionnaire  when  the 
test  group  Is  small  (10  to  20),  and  when  time  and  test  conditions 
permit. 

As  noted  In  Section  II-B,  unstructured  Interviews  are  not  Included 
within  the  definition  of  questionnaire  used  In  this  manual.  They  are, 
therefore,  not  discussed  further. 


212 


XIV-C  Page  1 
8  Nar  85 
(S.  1  Jul  76) 


C.  Interviewer's  Characteristics  Relative  to  Interviewee 

More  research  Is  needed  to  Identify  how  characteristics  of  an  inter¬ 
viewer  affect  the  respondent.  Some  areas  of  concern  are  presented 
below. 

1.  Rank.  Grade  or  Status  of  the  Interviewer 

For  Army  field  test  evaluations,  it  Is  recommended  that  the  inter¬ 
viewer  should  be  of  similar  rank  or  grade  to  the  individuals  being 
Interviewed.  A  difference  in  rank  or  grade  introduces  a  bias  in 
the  data  which  has  been  found  to  substantially  influence  test 
results.  Interviewees  tend  to  give  the  answer  they  perceive  the 
higher-ranking  Interviewer  favors.  Mhen  the  interviewer  is  of 
lower  grade,  the  interviewee  may  not  show  respect  and  may  not 
cooperate. 

Evidence  Indicates  that  the  greater  the  dispari^  between  the 
status  of  the  interviewer  and  that  of  the  respondent,  the  greater 
the  tendency  for  biased  responses.  Respondents  tend  to  provide 
answers  that  will  be  more  favorably  received  by  the  interviewer. 

Data  suggest  that  in  the  Interview  situation  the  respondent  tends 
to  support  the  norms  adhered  to  by  the  interviewer.  Lower  socio¬ 
economic  respondents  may  defer  to  the  norms  represented  by  a  high¬ 
er-status  interviewer.  The  effect,  however,  is  related  to  the 
types  of  questions  asked.  Sensitive  issues  involving  socially 
accepted  or  rejected  answers  will  effect  more  bias. 

2.  Sex  of  the  Interviewer 

Differences  in  response  patterns  according  to  the  interviewer's  sex 
depend  on  subject  matter  as  well  as  on  the  composition  of  the 
respondent  populations  and  other  characteristics  of  the  specific 
survey  situation.  Subject  matter  which  tends  to  be  most  sensitive 
to  differences  in  male/female  response  patterns  deals  with  gender 
stereotypes.  Interview  items  used  in  performance  appraisals  may  be 
sensitive  to  sex  role  stereotypes.  It  is  recommended  that  this 
type  of  item  be  investigated  for  rating  differences  between  males 
and  females.  Interview  items  that  are  relevant  to  technical  back¬ 
ground  experience  (not  usually  obtained  by  females)  also  show 
gender  response  differences. 

3.  Race  of  the  Interviewer 


The  effects  of  the  race  of  the  Interviewer  on  the  respondent  should 
probably  be  viewed  as  the  result  of  interaction  between  interviewer 
and  respondent  characteristics,  or  the  result  of  the  item  content. 
Respondents  often  give  socially-desirable  answers  to  interviewers 
whose  race  differs  from  theirs,  particularly  if  the  interviewee's 
social  status  is  lower  than  that  of  the  interviewer  and  the  topic 
of  the  question  is  threatening. 


213 


XIV-C  Ptge  2 
S  Mar  85 
(s.  1  Jul  76) 

Nonsensitive,  nonraclal  Items  appear  to  be  relatively  Immune  to 
Interviewer  effects  for  racial  background.  Therefore,  racial 
background  of  the  Interviewer  does  not  usually  seem  to  affect 
survey  results.  It  would  be  possible  to  assign  interviewers  of 
different  racial  background  regardless  of  the  respondent's  racial 
background.  An  Interviewer's  race  can  probably  establish  different 
frames  of  reference  for  Items  with  racially-related  content.  For 
threatening  Items  or  Items  with  racially-related  content,  more 
valid  results  might  be  expected  when  the  Interviewer  Is  of  the  same 
race  as  the  respondent. 

4.  Experience  of  the  Interviewer 

There  may  be  no  significant  differences  between  Interview  comple¬ 
tion  rates  for  experienced  and  Inexperienced  Interviewers  who  have 
received  sufficient  Interviewer  training  for  face-to-face  Inter¬ 
views  and  telephone  Interviews.  However,  It  has  been  found  that 
experienced  Interviewers  may  have  different  error  rates  than  Inex¬ 
perienced  Interviewers.  This  error  rate  has  been  associated  with 
the  age  of  the  Interviewer,  and  the  amount  of  Interviewer  training. 
Interviewer  error  Is  usually  controlled  through  selection  and 
training.  Older  Interviewers  (age  55  and  over)  have  been  known  to 
frequently  deviate  from  Interviewer  guides.  Younger  Interviewers 
were  found  to  follow  the  Interview  guides  more  closely.  Nonstan- 
dardlzed  administration  of  the  Interview  could  jeopardize  the 
overall  standardization  of  the  survey  procedures. 

Other  evidence  Indicates  that  field  Interviewers  trained  for  less 
than  a  day  produce  more  survey  errors  than  more  highly  trained 
Interviewers.  Individuals  responsible  for  developing  Interview 
Items,  guides,  and  training  require  sufficient  development  time 
prior  to  administration  of  the  Interview.  Interview  techniques  to 
Increase  standardization  have  been  known  to  Improve  through  train¬ 
ing.  Response  rates  for  telephone  Interviews  may  also  be  Increased 
through  training. 


214 


XIV-D  Page  1 
8  Mar  85 
(s.  I  Jul  76) 

0.  Situational  Factors 

Among  the  situational  factors  that  should  be  considered  when  Interviews 

are  used  are  the  following: 

1.  It  helps  greatly  If  the  interviewees  perceive  the  Interviewer  as 
Interested  In  hearing  their  connents,  as  willing  to  listen,  and  (If 
the  situation  requires)  as  willing  to  protect  then  from  recrinina* 
tion  for  being  adverse  In  their  evaluations. 

2.  Interviews  should  be  conducted  In  a  quiet,  tenperature-controlled 
environment  where  the  respondent  can  be  comfortable  and  relaxed. 
Each  respondent  should  be  Interviewed  In  private,  separate  and 
apart  from  all  others,  so  that  no  other  person  hears  or  Is  biased 
by  his/her  responses. 

3.  The  reinforcing  behaviors  of  the  Interviewer  have  an  Influence  on 
the  responses  collected,  and  at  tines  may  cause  respondents  to 
change  their  preferences.  Such  conments  as  "good*  or  "fine*  and 
such  actions  as  smiling  and  nodding  can  have  a  decided  effect  on 
test  results.  Praised  respondents  normally  offer  more  answers  than 
unpraised  ones.  Praising  respondents  may  also  tend  to  reduce 
"Don't  know"  answers  without  Increasing  Insincere  or  dishonest 
responses. 

4.  Interested  respondents  seem  to  be  more  subject  to  Interviewer 
effects  than  uninterested  ones. 

5.  Interview  questions  which  are  read  slowly  Indicate  to  respondents 
that  they  can  take  their  time  In  carefully  and  thoughtfully  an> 
swering  the  question.  Rushing  through  an  Interview  may  reduce 
accuracy. 

6.  Use  a  "focus*  group  or  pilot  screening  as  a  way  to  develop  hypothe¬ 
ses  and  refine  questions  for  establishing  an  Interview  guide  and 
Interview  Items.  Interview  guides  are  to  be  followed  so  that 
questions  are  asked  without  any  wording  changes.  This  promotes 
standardization  across  Intehrlews, 

7.  Incomplete  answers  to  survey  questions  require  nondirective  prob¬ 
ing.  When  asking  for  clarification  regarding  an  Incomplete  answer, 
the  respondent  Is  not  to  be  directed  toward  any  one  response. 
Instead,  phrases  siich  as  "tell  me  more*  would  be  useful  to  mnploy. 

8.  Recording  answers  to  Interviews  that  use  closed-end  questions 
requires  only  that  the  Interviewer  mark  the  answer  that  the  respon¬ 
dent  selects. 

9.  When  recording  answers  to  open-ended  ^estlons,  use  a  tape  recorder 
if  the  respondent  agrees  or  write  down  the  answers  verbatim.  It  Is 
possible  to  combine  open-ended  and  closed-end  Items  for  Interview 
questionnaires,  although  coding  and  recording  may  be  more  diffi¬ 
cult  for  the  open*^nde.d  Items. 

215 


XIV-O  Page  2 
8  Nar  85 


10.  For  telephone  surveys,  use  an  Interview  structure  and  Interview 
guide  that  promotes  a  high  Interaction  between  the  Interviewer  and 
the  respondent.  This  may  be  useful  In  Increasing  response  rate. 

11.  Response  cards  can  be  adapted  from  face>to«face  Interviews  for 
telephone  surveys.  Oral  labeling  of  the  scale  points  should  be 
assessed  on  a  pilot  survey  to  be  sure  that  the  responses  are  not 
biased  by  the  oral  presentation  of  the  scale. 


i  XIV-E  Page  1 

%  8  Mar  85 

I  ($.  I  Jul  76) 

E.  Training  Interviewers 

Generally,  Interviewers  require  a  certain  amount  of  training.  Army 
personnel  may  check  with  the  Army  Research  Institute-Field  Unit  closest 
to  them  for  help  In  this  area.  Some  of  the  factors  which  should  be 
considered  when  training  interviewers  are  the  following: 

1.  Training  sessions  for  Interviewers  usually  range  between  two  days 
and  five  days.  Interviewers  conducting  field  Interviews  require 
more  training  than  Individuals  who  conduct  telephone  Interviews. 

Two  days  mlnlrntmi  up  through  five  days  training  are  recommended  for 
face-to-face  interviews. 

2.  Sometimes  researchers  provide  Interviewers  with  information  about 
the  general  research  goals,  sampling  procedures,  data  analysis,  and 
reports  that  will  result  from  the  survey. 

3.  Interviewer  training  requires  general  Information  In  the  course 
content  such  as  how  to  Introduce  the  study,  as  well  as  more  speci¬ 
fic  Information.  Interviewers  need  to  be  familiar  with  the  wording 
used  In  the  survey,  and  any  branching  Instructions.  Standardiza¬ 
tion  of  the  study  through  asking  questions,  probing  Incomplete 
answers,  and  recording  answers  are  Important  aspects  of  the  course 
content. 

4.  Interviewer  training  usually  Incorporates  a  demonstration  of  the 
standardized  Interview,  and  exercises  where  trainees  role-play  both 
the  respondent  and  the  Interviewer.  Practice  sessions  may  also  be 
tape  recorded. 


217 


XIV-F  Page  1 
1  Jul  76 


F.  Data  Recording  and  Reduction 

In  the  structured  Interview,  both  questions  and  answers  are  orally 
coonunlcated.  The  Interviewer  nay  encode  the  answers  on  paper,  or  tape 
record  the  responses  for  later  encoding  (but  only  If  the  Interviewee 
agrees  to  the  taping  and  does  not  seem  Influenced  by  the  presence  of  a 
recording  device). 

Other  topics  related  to  Interview  data  recording  and  reduction  are 
outside  the  scope  of  this  manual. 


213 


XIY-G  Page  1 
8  Nar  85 
(S.  i  JuT  76} 


6.  Special  Interviewer  Problems 

This  section  notes  some  special  problems  related  to  Interviews. 

When  Interviews  are  used,  the  qualified  Interviewer  will  avoid  leading, 
pressuring,  or  Influencing  the  direction  of  an  Interviewee's  evalua¬ 
tions.  If  potential  interviewers  have  strong  preferences  regarding  the 
system(s]  being  tested,  they  should  probably  be  disqualified. 

Many  studies  have  been  conducted  that  show  other  biasing  effects  on  the 
Interviewer.  Factors  leading  to  significant  effects  of  the  Interviewer 
upon  results  Include:  relatively  high  ambiguity  In  the  wording  of  the 
Inquiry;  Interviewer  "resistance*  to  a  given  question;  and  resistance 
to  additional  questioning  or  probing.  Interviewer  bias  can  exist 
without  being  apparent,  and  the  direction  of  bias  Is  not  necessarily 
uniform.  The  least  Interviewer  bias  Is  probably  found  with  questions 
that  can  be  answered  "Yes"  or  "Ho."  The  bias  can  result  from  differ¬ 
ences  In  Interviewing  methods,  differences  In  the  degree  of  success  In 
eliciting  factual  Information,  and  differences  In  classifying  the 
respondent's  answers.  Interviewers'  expectations  may  have  a  more 
powerful  effect  on  the  results  than  their  Ideological  preferences. 

Some  Interviewers  have  a  tendency  not  to  transmit  printed  Instructions 
word  for  word.  Hence,  total  phrases  may  be  eliminated  and  key  words 
orlglhally  Intended  to  focus  the  respondent's  attention  on  some  speci¬ 
fic  point  are  omitted  or  changed.  Key  Ideas  are  lost,  mainly  through 
omission.  Variability  of  Interviewer  performance  seems  to  vary  both 
across  Interviewers  and  within  Individuals. 

An  Interviewer's  attitude  toward  a  question  can  communicate  Itself 
sufficiently  to  the  respondent  so  that  the  meaning  of  the  ^stlon  Is 
altered.  When  training  Interviewers  to  deliver  a  questionnaire  In  a 
standardized  fashion,  they  need  to  rehearse  the  questions  for  tone  of 
voice  and  body  language  to  reduce  any  Interviewer  bias. 


219 


'X 


